Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent race leading to stale instances and runners #206

Open
hanno-becker opened this issue Nov 17, 2024 · 0 comments
Open

Prevent race leading to stale instances and runners #206

hanno-becker opened this issue Nov 17, 2024 · 0 comments

Comments

@hanno-becker
Copy link

hanno-becker commented Nov 17, 2024

We observed a gradual buildup of runners and instances that ultimately led to our CI grind to a halt.

The problem seems to be that the start job is not marked as always(). This open a race condition where an EC2 is started, but the start job gets cancelled before it reports back. In that case, the stop job can't terminate the instance because it has not yet received its name. Similarly, a runner can be left orphaned.

It seems that the start job in a ec2-github-runner based workflow must be marked always() so it cannot be cancelled, and the above race does not happen.

Note that if the cancellation of start jobs is common if the workflow is part of a concurrency group. For example, if it is triggered upon updates to a fixed PR, occasional fast back-to-back updates to the same PR would lead to the race, and the buildup of orphaned runners and instances.

Suggestion: Change the REAME.md to mark start as always(), and document that this is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant