Skip to content

Commit c8b09f5

Browse files
authored
Cache bazel builds with --disk_cache using GitHub cache action. (#9695)
This PR addresses issue #9659, caching bazel build artifacts on `--disk_cache` using GitHub `cache` action. This should allow PRs from external repositories (i.e. forks) to benefit from build caching. That's because remote cache is only enabled for PRs from within `pytorch/xla` repository. In summary, this PR will add the following cache behavior: 1. Every commit pushed to `master` or any release candidate branch (e.g. `rX.X`) will: - Create a new disk cache - Populate the cache as it builds PyTorch/XLA - Save the populated disk cache, associating it with the current branch, and the current commit 2. Every PR on branch `X` (either `master` or a release candidate branch) will: - Try to restore the cache associated with `X` at the commit it's trying to merge with - Build PyTorch/XLA using the restored disk cache - If we don't actually find a cache to restore, we won't use a disk cache Note that we only have 10GB of cache storage. So, in order to minimize it, I made the following decisions: - The disk caches created in (1) won't restore any cache in the beginning - Smaller caches - The disk caches restored in (2) won't be saved in the end - Fewer caches
1 parent af2a47a commit c8b09f5

File tree

3 files changed

+45
-0
lines changed

3 files changed

+45
-0
lines changed

.github/workflows/_build_torch_xla.yml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ jobs:
3434
env:
3535
GCLOUD_SERVICE_KEY: ${{ secrets.gcloud-service-key }}
3636
BAZEL_REMOTE_CACHE: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }}
37+
BAZEL_DISK_CACHE_PATH: "${{ github.workspace }}/disk-cache"
3738
BAZEL_JOBS: "" # Let bazel decide the parallelism based on the number of CPUs.
3839
BUILD_CPP_TESTS: 1
3940
steps:
@@ -46,27 +47,63 @@ jobs:
4647
sparse-checkout: |
4748
.github/workflows/setup
4849
path: .actions
50+
4951
- name: Setup
5052
if: inputs.has_code_changes == 'true'
5153
uses: ./.actions/.github/workflows/setup
54+
55+
# Restore the disk cache associated with the base branch and the commit SHA
56+
# that was used for merging with the current pr.
57+
- name: Retrieve disk cache
58+
id: cache
59+
# Only runs for 'pull_request' events.
60+
# We want to create a new disk cache on 'push' events.
61+
if: github.event_name == 'pull_request' && inputs.has_code_changes == 'true'
62+
uses: actions/cache/restore@v4
63+
with:
64+
path: ${{ env.BAZEL_DISK_CACHE_PATH }}
65+
key: ${{ runner.os }}-${{ github.base_ref }}-${{ github.event.pull_request.base.sha }}
66+
5267
- name: Build
5368
if: inputs.has_code_changes == 'true'
5469
shell: bash
70+
env:
71+
# Only actually build with the disk cache if:
72+
#
73+
# 1. This is not a 'pull_request' event, e.g.: 'push'
74+
# 2. We did restore a cache in the previous step
75+
#
76+
# Otherwise, (e.g. a 'pull_request' event that didn't find a cache
77+
# to restore) it doesn't make sense to use the disk cache.
78+
BAZEL_DISK_CACHE_PATH: ${{ (github.event_name != 'pull_request' || steps.cache.outputs.cache-hit) && env.BAZEL_DISK_CACHE_PATH }}
5579
run: |
5680
cd pytorch/xla/infra/ansible
5781
ansible-playbook playbook.yaml -vvv -e "stage=build arch=amd64 accelerator=tpu src_root=${GITHUB_WORKSPACE} bundle_libtpu=0 build_cpp_tests=1 git_versioned_xla_build=1 cache_suffix=-ci" --skip-tags=fetch_srcs,install_deps
82+
5883
- name: Upload wheel
5984
if: inputs.has_code_changes == 'true'
6085
uses: actions/upload-artifact@v4
6186
with:
6287
name: torch-xla-wheels
6388
path: /dist/*.whl
89+
6490
- name: Upload CPP test binaries
6591
if: inputs.has_code_changes == 'true'
6692
uses: actions/upload-artifact@v4
6793
with:
6894
name: cpp-test-bin
6995
path: /tmp/test/bin
96+
97+
# Save the disk cache, associating it to the current branch and commit SHA.
98+
- name: Save disk cache
99+
# Only create new caches only on 'push' events, so that pull requests that
100+
# can take advantage of those.
101+
if: github.event_name == 'push' && inputs.has_code_changes == 'true'
102+
uses: actions/cache/save@v4
103+
with:
104+
key: ${{ runner.os }}-${{ github.ref_name }}-${{ github.sha }}
105+
path: ${{ env.BAZEL_DISK_CACHE_PATH }}
106+
70107
- name: Report no code changes
71108
if: inputs.has_code_changes == 'false'
72109
run: |

build_util.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,10 @@ def bazel_options_from_env() -> Iterable[str]:
4646
if check_env_flag('XLA_CPU_USE_ACL'):
4747
bazel_flags.append('--config=acl')
4848

49+
disk_cache = os.getenv('BAZEL_DISK_CACHE_PATH')
50+
if disk_cache is not None:
51+
bazel_flags.append('--disk_cache=%s' % disk_cache)
52+
4953
return bazel_flags
5054

5155

setup.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@
3434
# BAZEL_REMOTE_CACHE=""
3535
# whether to use remote cache for builds
3636
#
37+
# BAZEL_DISK_CACHE_PATH=""
38+
# path to the bazel disk cache to use for caching builds. If this is empty, the
39+
# build won't use a local disk cache.
40+
#
3741
# TPUVM_MODE=0
3842
# whether to build for TPU
3943
#

0 commit comments

Comments
 (0)