Use selfhosted S3 build cache #724

chippmann · 2024-10-12T13:24:19Z

TLDR;

New caching for our repo with selfhosted S3
Existing caching for forks
Improved CI/CD pipeline overall

Overview

We regularly run into cache misses on our CI/CD pipeline and thus unnecessarily recompile lot's of code because our available cache size is "only" 10GB.
In that we have to fit all of the intermediate files needed for all compilations for all targets and subtargets as well as gradle and it's dependencies. Thus we regularly reach an over allocation of around 60GB before github deletes some of our cache again.

To gain more control and flexibility regarding caching (especially with the goal of supporting even more targets and platforms in the future like linux arm64) I opted to implement a selfhosted S3 cache using minio but with a fallback to regular github caching for forks which do not have access to the needed credentials in our runner secrets.

With this we can cache even more fine grained as i assigned a cap for 1TB for
now (not that we would need that much anyways).

Currently we sit around 10GB already for all caches of only one branch.

With this as we now have way more storage on our hands, i opted for a cache per git ref (branch, tag) and if none is present, the cache from master is copied. This means we can now even cache on a per branch basis which should further improve our pipeline and give us more control if we need it.

This all comes with one downside though: As the cache is located at my home and i do not have a business internet contract, my upload is capped at 80-100Mbit/s while the download is capped at 1Gbit/s. This means that from the view of the github runner, the performance is inverted. Which means that the runner's cache download is quite a lot slower than from github (which is around 120-150 Mbit/s. So we are roughly 20-70Mbit/s slower). The upload performance is not affected though.
But overall IMO this is negligible. In the real world, this really only affects the dev and debug build of the editors noticeably by around a minute as there the cache is around 1GB each.
But I expect the overall performance to still be better as we should not have any cache misses anymore

I also took the opportunity to rework some other parts of our CI/CD pipeline;

Android and iOS now also build and cache per target and arch
MoltenVK build for iOS is now also cached (although it does not fully respect that and rebuilds smaller parts of it anyways, it still leads to a build time reduction)
We now only checkout godot code if we really need it
We now also cache intermediate build files from gradle (previously we only cached gradle itself, and our jvm build dependencies, but not actual build files which still lead to a clean build each time)
For godot builds we now explicitly set the OS version which builds it explicitly instead of using latest on windows and macOS. For all other build we use latest

Finally it's worth noting that the overall speed of the pipeline really depends on a lot of factors. So this PR is not the silver bullet for our pipeline! It should however make it a lot more stable in regards to caching.

Other research work

I actually started this work by using selfhosted runners instead of github hosted ones. But in short; while using much more powerful hardware than we get for free, we have so many jobs in parallel that a few fast selfhosted runners can never compete with the vast amount of slower runners we get from github. So even with 3 Linux, 3 Windows and 1 MacOs runner self hosted, we were a lot slower than the github hosted runners.

And managing the cache and the dependencies on these was not worth it.

Thus i switched to caring more about cache misses rather than pure build performance.

But that work is not in vain. Once we revive the work done in #640 we can use them again for benchmarking as there we need controlled consistency. So my service setup at home is still ready for self hosted runners once needed.

Other S3 service providers

I also looked at some readily available S3 providers and services like infomaniak or amazon aws. But they are either limited in access restrictions or just way to expensive for our use cases.

This reverts commit bf8fedd.

…inio

…edless subdirs

This reverts commit 34d6206.

This reverts commit 428e14f.

This reverts commit 0f54ca3.

This reverts commit 82634a7.

This reverts commit f454806.

…ers" This reverts commit 2c6a089.

This reverts commit fb15f28.

chippmann added 30 commits October 11, 2024 15:09

Setup S3 build cache

d7e41c6

Inherit secrets

4d08e30

Reference secrets

01eaf38

Fix action reference

8306567

Create custom action for S3 caching

9254bb2

Revert "Create custom action for S3 caching"

721b001

This reverts commit bf8fedd.

Use up to date fork of original action

a3a1204

Use maintained fork

40cd3e5

Fix cache step checks

ccce4ce

Fix cache step checks

dd2b888

Add log

115b3d2

Use inherited secrets instead

62ff3c4

Setup new custom action which is more generic

baeaf2d

Improve workflow setup so that each target gets truly its own cache

e441b9d

Implement cleanup

b2983e2

Rename trigger workflow and add todo to convert it to tier setup of m…

a609767

…inio

Cleanup ubuntu os version used for actions

b2d5160

Improve setup and caching of jvm builds

032a585

Build idea plugin zip again

49c1e30

Replace slashes in ref with underscores to prevent the creation of ne…

a1b7405

…edless subdirs

Let jvm build also inherit secrets

ac876dc

Add missing input variable

7b945c7

Improve workflow setup for tests and improve gradle cache setup

2a2894d

Update action github-script

1f0bc03

Let test actions inherit secrets

9e14760

Do not use latest os for godot builds

1a7d6f6

Fix download path of editor executable

2d5ef34

Fix change dir call

39b0a35

Enable PR checks for regular branch builds to test them

29df707

Remove env reference

ef00210

chippmann added 26 commits October 11, 2024 15:10

Use temurin as jdk distribution

d3f59cf

Fix worklfow name

463a2a3

Test cleanup

42bce8f

Add moltenvk cache

4ea2bc5

Fix cleanup alias

c990ef2

Filter out listings named STANDARD

28e9d5f

Add cleanup log

d38cccd

Only do cleanup on master again

e3bca17

Run PR checks on branch again to test moltenvk cache

e7499bc

Clone our module for our custom cache action

4e8a935

Inherit secrets

2d05f15

Fix ios moltenvk cache name

4660a5a

Setup java home using already setup java versions on the runners

2c6a089

Fix java home assignment

f454806

Fix run indent

82634a7

Log java home

0f54ca3

Run on pull requests only again

34d6206

Inherit secrets for assemble ios

76ce680

Revert "Run on pull requests only again"

fb15f28

This reverts commit 34d6206.

Manually reference jlink

428e14f

Revert "Manually reference jlink"

6a52c83

This reverts commit 428e14f.

Revert "Log java home"

2b31698

This reverts commit 0f54ca3.

Revert "Fix run indent"

ae9d105

This reverts commit 82634a7.

Revert "Fix java home assignment"

95652e1

This reverts commit f454806.

Revert "Setup java home using already setup java versions on the runn…

4d5092f

…ers" This reverts commit 2c6a089.

Reapply "Run on pull requests only again"

4738ad7

This reverts commit fb15f28.

chippmann requested review from CedNaru and piiertho October 12, 2024 13:24

Use base branch ref for PR's

ff56cdf

CedNaru approved these changes Oct 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use selfhosted S3 build cache #724

Use selfhosted S3 build cache #724

chippmann commented Oct 12, 2024 •

edited

Loading

Use selfhosted S3 build cache #724

Are you sure you want to change the base?

Use selfhosted S3 build cache #724

Conversation

chippmann commented Oct 12, 2024 • edited Loading

Overview

Other research work

Other S3 service providers

chippmann commented Oct 12, 2024 •

edited

Loading