Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use selfhosted S3 build cache #724

Open
wants to merge 68 commits into
base: master
Choose a base branch
from
Open

Conversation

chippmann
Copy link
Contributor

@chippmann chippmann commented Oct 12, 2024

TLDR;

  • New caching for our repo with selfhosted S3
  • Existing caching for forks
  • Improved CI/CD pipeline overall

Overview

We regularly run into cache misses on our CI/CD pipeline and thus unnecessarily recompile lot's of code because our available cache size is "only" 10GB.
In that we have to fit all of the intermediate files needed for all compilations for all targets and subtargets as well as gradle and it's dependencies. Thus we regularly reach an over allocation of around 60GB before github deletes some of our cache again.

To gain more control and flexibility regarding caching (especially with the goal of supporting even more targets and platforms in the future like linux arm64) I opted to implement a selfhosted S3 cache using minio but with a fallback to regular github caching for forks which do not have access to the needed credentials in our runner secrets.

With this we can cache even more fine grained as i assigned a cap for 1TB for
now (not that we would need that much anyways).

2024-10-12-150012_138x36_scrot
Currently we sit around 10GB already for all caches of only one branch.

With this as we now have way more storage on our hands, i opted for a cache per git ref (branch, tag) and if none is present, the cache from master is copied. This means we can now even cache on a per branch basis which should further improve our pipeline and give us more control if we need it.

This all comes with one downside though: As the cache is located at my home and i do not have a business internet contract, my upload is capped at 80-100Mbit/s while the download is capped at 1Gbit/s. This means that from the view of the github runner, the performance is inverted. Which means that the runner's cache download is quite a lot slower than from github (which is around 120-150 Mbit/s. So we are roughly 20-70Mbit/s slower). The upload performance is not affected though.
But overall IMO this is negligible. In the real world, this really only affects the dev and debug build of the editors noticeably by around a minute as there the cache is around 1GB each.
But I expect the overall performance to still be better as we should not have any cache misses anymore

I also took the opportunity to rework some other parts of our CI/CD pipeline;

  • Android and iOS now also build and cache per target and arch
  • MoltenVK build for iOS is now also cached (although it does not fully respect that and rebuilds smaller parts of it anyways, it still leads to a build time reduction)
  • We now only checkout godot code if we really need it
  • We now also cache intermediate build files from gradle (previously we only cached gradle itself, and our jvm build dependencies, but not actual build files which still lead to a clean build each time)
  • For godot builds we now explicitly set the OS version which builds it explicitly instead of using latest on windows and macOS. For all other build we use latest

Finally it's worth noting that the overall speed of the pipeline really depends on a lot of factors. So this PR is not the silver bullet for our pipeline! It should however make it a lot more stable in regards to caching.

Other research work

I actually started this work by using selfhosted runners instead of github hosted ones. But in short; while using much more powerful hardware than we get for free, we have so many jobs in parallel that a few fast selfhosted runners can never compete with the vast amount of slower runners we get from github. So even with 3 Linux, 3 Windows and 1 MacOs runner self hosted, we were a lot slower than the github hosted runners.

And managing the cache and the dependencies on these was not worth it.

Thus i switched to caring more about cache misses rather than pure build performance.

But that work is not in vain. Once we revive the work done in #640 we can use them again for benchmarking as there we need controlled consistency. So my service setup at home is still ready for self hosted runners once needed.

Other S3 service providers

I also looked at some readily available S3 providers and services like infomaniak or amazon aws. But they are either limited in access restrictions or just way to expensive for our use cases.

@chippmann chippmann requested review from CedNaru and piiertho October 12, 2024 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants