Skip to content

Conversation

@iwknow
Copy link
Collaborator

@iwknow iwknow commented Oct 20, 2025

Add helper functions getDefaultXLAGenerator and createXLAGenerator to XLA random number generator

These helper functions will be used with XLA hook later.

Refer to #9159

@iwknow
Copy link
Collaborator Author

iwknow commented Oct 22, 2025

@qihqi do you know why do i always get build timeout? it builds successfully locally on my machine. i am not able to find any clue of the cause from the build log. please take a look

@iwknow
Copy link
Collaborator Author

iwknow commented Oct 24, 2025

@ysiraichi @qihqi can you please take a look. thanks!

@ysiraichi
Copy link
Collaborator

do you know why do i always get build timeout?

This might be because your PR is not from a branch on this PyTorch/XLA repository.
That causes the CI not to use the remote cache.
I'm currently working on using GitHub cache for mitigating that #9659.

Copy link
Collaborator

@ysiraichi ysiraichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. I think that it looks good overall.
Could you add a few C++ tests and check everything is working?

@ysiraichi ysiraichi mentioned this pull request Oct 28, 2025
2 tasks
@iwknow iwknow requested a review from ysiraichi October 29, 2025 17:51
@iwknow
Copy link
Collaborator Author

iwknow commented Oct 30, 2025

strangely, i am no longer able to build the //test/cpp:test_xla_generator. The command that i use bazel build //test/cpp:test_xla_generator --experimental_ui_max_stdouterr_bytes=-1 the last flag is to let it print without length limit. the error i get is:

bazel-out/k8-opt/bin/_solib_k8/_U_A_Atorch_S_S_Clibc10___Ubuild_Slib/libc10.so: error: undefined reference to 'log', version 'GLIBC_2.29'
/usr/local/lib/libpython3.10.so: error: undefined reference to 'sem_clockwait', version 'GLIBC_2.30'
bazel-out/k8-opt/bin/_solib_k8/_U_A_Atorch_S_S_Clibc10___Ubuild_Slib/libc10.so: error: undefined reference to 'pthread_cond_clockwait', version 'GLIBC_2.30'
collect2: error: ld returned 1 exit status
Target //test/cpp:test_xla_generator failed to build

@ysiraichi do you have any clue about the issue? i was previously able to build and run //test/cpp:test_xla_generator

@ysiraichi
Copy link
Collaborator

Not sure what happened there.
It looks like you have an old glibc version in your system.
Are you using the docker image?

@iwknow
Copy link
Collaborator Author

iwknow commented Oct 30, 2025

Not sure what happened there. It looks like you have an old glibc version in your system. Are you using the docker image?

i am using the tpu-contributor dev container. I got the following when i run ldd --version

ldd (Debian GLIBC 2.31-13+deb11u11) 2.31
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

could this be related to build cache? or it is incompatible with my pytorch? should i sync my local pytorch version to the head? Also, i don't have local pytorch-xla installed as a python package. i am not sure if this is related.

@ysiraichi
Copy link
Collaborator

Try recompiling everything from scratch: PyTorch and PyTorch/XLA (clean the cache).

ysiraichi added a commit that referenced this pull request Oct 31, 2025
PRs from external repositories are timeouting on `_build_torch_xla.yml`
workflow. That's because in those cases, [the remote cache is
disabled][1]. In such cases, [the fixed 45 minutes][2] is not enough
anymore. See, for example, PR #9682 that fails due to this timeout.

Here's my plan to address this issue:

- Bump the timeout by 5 minutes (this PR)
- Create a disk-cache using GitHub cache actions for reducing build time
on PRs from external repositories (see [#9659][3] for more information)

This PR will go through the following steps:

- [x] Reproduce the CI build timeout 
- [x] Bump the timeout by 5 minutes

[1]:
https://github.com/pytorch/xla/blob/df6798dfb931ce7c7fe5bed2447cd1092a5981af/.github/workflows/_build_torch_xla.yml#L36
[2]:
https://github.com/pytorch/xla/blob/df6798dfb931ce7c7fe5bed2447cd1092a5981af/.github/workflows/build_and_test.yml#L44
[3]: #9659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants