Delay in starting an EC2 instance #2045
-
Hi, We've noticed lately that some runs are staying a very long time in this state
It ultimately seems to start, but I'm wondering what could be the typical cause of such issues. In Cloud Watch I do see the following: Very first occurrence of a log associated with the run I started
Corresponding EC2 instance being started:
A bit more than 3 hours between the job is first received, and the EC2 instance is started. It does not happen on all jobs, we've only noticed it recently and our initial investigation seems to point towards matrix and/or nightly jobs. We are using these settings:
We are not hitting the maximum runner count. For example, I'm seeing the issue on a job right now and there is only one github runner started Any pointers to help us understand where the issue could come from would be greatly appreciated. Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Most likely a case of a runner being stolen by another job. I.e. imagine job A and B are launched, and use the same Best way to debug this would be to assign |
Beta Was this translation helpful? Give feedback.
Most likely a case of a runner being stolen by another job. I.e. imagine job A and B are launched, and use the same
runs-on
labels. If runner A fails to start, runner B might be assigned to job A, while job B hangs for a while, until job C with the same labels is started. At this point job B might start executing, while job C hangs, etc.Best way to debug this would be to assign
${{ github.run_id }}
in yourruns-on
labels to force the runner to process the job it was started for, but I don't think this project supports that.