✨ Cancel instance refresh on any relevant change to ASG instead of blocking until previous one is finished (which may have led to failing nodes due to outdated join token) #5543
+236
−108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(reopened from #5173 because I didn't have enough permissions to force-push)
What type of PR is this?
What this PR does / why we need it:
Changing any relevant
spec.*
for anAWSMachinePool
triggers rolling of nodes via ASG instance refresh. If another change happens shortly afterwards, it has to wait until the first rollout is done, and will then trigger another instance refresh. But it is neither necessary nor desired to roll all worker nodes twice in such a case, and it's much slower. Instead, cancel the first pending instance refresh, wait until another one can be started, and apply the latest change as soon as possible with the second instance refresh.This change has been running fine in Giant Swarm's CAPA fork for almost a year at the time of opening this PR.
Checklist:
Release note: