perf(ci): use self-hosted macos runner if available, fallback to public runners if not by mikehardy · Pull Request #587 · ankidroid/Anki-Android-Backend

mikehardy · 2025-10-17T01:31:35Z

GitHub macos runners are performing incredibly badly at the moment

This is a change that lets us use Tartelet / Tart to do a self-hosted runner, with fallback if they're offline

https://josephduffy.co.uk/posts/self-hosting-macos-github-runners

local app that does the VM clone/start/GH-register/destroy - https://github.com/shapehq/tartelet
local VM system built on macOS virtualization - https://github.com/cirruslabs/tart

If this should be unwound:

remove the github app for it in ankidroid org app settings
remove the org level secret with token used to access the github API to query for runner status

mikehardy · 2025-10-17T01:56:49Z

Success on the first run, how about that.

Took 22mins to run the macos tests on build-quick workflow, of which 3'23" was restoring rust cache, which was a 2GB network transfer (the largest transfer in the entire workflow)

Going to eliminate the rust cache for self-hosted builds, and also alter the release workflow to use the macos-tartelet runner, and see how it goes

mikehardy · 2025-10-17T02:17:53Z

mild hiccup - expanded my local Tartelet ephemeral self-hosted runner manager to do 2 at once for this push / CI run

did spawn 2 local Tart VMs but one keeps trying to self-update the github runner from 2.328.0 to 2.329.0 and it resulted in a disconnect of the second runner and a fail of the job assigned to it

Additionally, the holy grail here is to use the self-hosted runner if I make it available, but fallback to a public runner if it is offline. This does not appear to be built-in functionality but there is an API endpoint that lists runners and their status, and you can use that in a pre-step to select what runner to use: https://github.com/orgs/community/discussions/20019#discussioncomment-13391511 (or this action, with this PR for organization runner support https://github.com/jimmygchen/runner-fallback-action/pull/28/files)

2nd run initial findings are that the JDK download can stall. JDK17 is in the image I'm using as a base, but 21 is not, so it has to download. I should update my base VM to have what we want by default:

rustup and toolchain from rust-toolchain.toml (currently 1.89.0)
jdk from workflow definition (currently 21)(8'40" download stall)
brew install mingw-w64 (15min download stall)
brew install MaterializeInc/crosstools/x86_64-unknown-linux-gnu (another long download stall...)
gradle version from source tree (currently 9)(6min download stall)

mikehardy · 2025-10-17T02:51:28Z

Okay, some optimization thoughts above, as a proof of concept experiment I was able to take this from an idea to ... working in just about 90 minutes, which is promising. I don't see any technical difficulties really preventing me from dynamically using the self-hosted runner if available and fallback if not - I'm the one that implemented dynamic OS matrix construction in the first place and we have scripts that use the GitHub API in workflows already, it's all tech we're familiar with.

So I'm going to sleep now, but this seems like a valid way to un-bottleneck macOS runners in this repo which is a good thing because they're unavailability is effectively blocking all development at the moment.

mikehardy · 2025-10-17T14:13:22Z

100% CPU bound now after implementation and execution of a "VM prep" script on the local Tart VM image that serves as the clone base for Tartelet ephemeral build runners.

Performance now acceptable IMHO, just a touch slower than windows now

build-quick results:

self-hosted macOS 19'3"
windows 16'27"
ubuntu 12'18"
public macOS usually 12-13'

build-release results:

self-hosted macOS 43'38"
public macOS between 47' (fastest ever) and 3'36" (slowest ever), typical ~57'

Run parameters:

rust uncached (at least 12' of the time was spent compiling rust in build-quick, much more in build-release)
both a build-quick and build-release working concurrently/parallel
Apple M2 laptop that's a bit RAM-starved

Very positive result. The public runners are clearly faster but there is still room for improvement on self-hosted perf, and it's already usefully quick.

Still a couple items to resolve

the rust cache itself (perhaps uncached for build-quick, but cached for build-release? Perhaps a local cache and tell rust where to find it on the base image? perhaps a self-hosted-specific lower cache size? or just lower size in general?)
the ability to fallback to public runners if my self-hosted runners are offline so I can maintain control over my local compute without blocking all CI

mikehardy · 2025-10-17T16:54:38Z

Rust caching was a nest of bugs, peeled into separate PR that is merging then I'll rebase this and re-push before moving forward #588

this makes them more or less machine-parseable for workflow runner preparation scripts

this will install all the build pre-requisites for a Tartelet/Tart self-hosted VM and warm up the cache with one build run should be used on the persistent base Tart VM image, prior to starting Tartelet which will clone that into ephemeral runners

mikehardy · 2025-10-17T19:29:26Z

Okay - ready to go

caching all fixed up, and it's improved in general on main now plus even more here
fallback to public runners is integrated and is tested + working (I've seen it use public runners when mine were offline just now)
- using personal fork of action+PR to have stable ref for integration of this feat: adding ability to run for orgs/ents jimmygchen/runner-fallback-action#28

Has a couple little fixes in separate commits, wasn't worth separating IMHO

If CI goes green, this current run represents the failure / runners offline test, so that means the new worst case works

And the best case is 2 more macos runners available sometimes, that work a bit faster than the public ones

They are at the organization level so are available (as is the fallback action token) in all ankidroid repos as desired.

mikehardy force-pushed the tartelet-test branch from 8d79219 to be9a087 Compare October 17, 2025 13:51

mikehardy force-pushed the tartelet-test branch from 52d9068 to 8844f8b Compare October 17, 2025 16:15

mikehardy mentioned this pull request Oct 17, 2025

Ci caching fixes #588

Merged

mikehardy force-pushed the tartelet-test branch from 8844f8b to dc752e5 Compare October 17, 2025 17:50

mikehardy added 4 commits October 17, 2025 14:22

perf(ci): use self-hosted runner for macos if available

9ef1710

chore: remove vestigial hard-coded NDK from doctor

a4883ad

chore: put brew install commands on their own line

8f682c0

this makes them more or less machine-parseable for workflow runner preparation scripts

mikehardy force-pushed the tartelet-test branch from 2bac98a to dbccd86 Compare October 17, 2025 19:23

mikehardy changed the title ~~WIP - experiment with a self-hosted macos runner~~ perf(ci): use self-hosted macos runner if available, fallback to public runners if not Oct 17, 2025

perf(ci): NDK install uses rust, restore rust cache prior

49e1f4f

mikehardy force-pushed the tartelet-test branch from dbccd86 to 49e1f4f Compare October 17, 2025 19:43

mikehardy merged commit eeaded1 into main Oct 17, 2025
9 checks passed

mikehardy deleted the tartelet-test branch October 17, 2025 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(ci): use self-hosted macos runner if available, fallback to public runners if not#587

perf(ci): use self-hosted macos runner if available, fallback to public runners if not#587
mikehardy merged 5 commits intomainfrom
tartelet-test

mikehardy commented Oct 17, 2025 •

edited

Loading

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 •

edited

Loading

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 •

edited

Loading

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mikehardy commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikehardy commented Oct 17, 2025

Uh oh!

mikehardy commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mikehardy commented Oct 17, 2025 •

edited

Loading

mikehardy commented Oct 17, 2025 •

edited

Loading

mikehardy commented Oct 17, 2025 •

edited

Loading

mikehardy commented Oct 17, 2025 •

edited

Loading