-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Wait for the VMSS capacity update for 1 minute before async #8242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6fb364d
to
81c06dd
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hakman The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b8302f3
to
1a249c1
Compare
Signed-off-by: Ciprian Hacman <[email protected]>
1a249c1
to
5292339
Compare
go func() { | ||
ctx, cancel := getContextWithTimeout(asyncContextTimeout) | ||
defer cancel() | ||
_ = scaleSet.waitForCreateOrUpdateInstances(ctx, true, future) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than throw away this error here I'd get it and do something like this:
klog.Errorf("waitForCreateOrUpdateInstances(%s) failed, err: %v", scaleSet.Name, err)
And then get rid of the above log statement in waitForCreateOrUpdateInstances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I also kept the success message, same as I did above.
1f42625
to
54423f8
Compare
54423f8
to
05a33ae
Compare
Signed-off-by: Ciprian Hacman <[email protected]>
05a33ae
to
a292788
Compare
The issue should also be addressed by setting |
What type of PR is this?
/kind bug
What this PR does / why we need it:
In rare situations when the VMSS fails to scale (OverconstrainedAllocationRequest, OverconstrainedZonalAllocationRequest, ...), the error is not handled until the context timeout expires (max-node-provision-time).
Which issue(s) this PR fixes:
Fixes: N/A
Special notes for your reviewer:
This is just a suggestion from someone that is not that familiar with the CAS code, there may be better options available.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
N/A