-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add some race-condition protection to VPA recommender #8320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Also move it to a private field, so that reads/writes are all via the protected methods
From https://pkg.go.dev/flag#PrintDefaults > The listed type, here int, can be changed by placing a back-quoted > name in the flag's usage string; the first such item in the message > is taken to be a parameter name to show in the message and the back > quotes are stripped from the message when displayed.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: adrianmoisey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -125,7 +125,7 @@ This document is auto-generated from the flag definitions in the VPA recommender | |||
| `storage` | string | | Specifies storage mode. Supported values: prometheus, checkpoint | | |||
| `target-cpu-percentile` | float | 0.9 | CPU usage percentile that will be used as a base for CPU target recommendation. Doesn't affect CPU lower bound, CPU upper bound nor memory recommendations. | | |||
| `target-memory-percentile` | float | 0.9 | Memory usage percentile that will be used as a base for memory target recommendation. Doesn't affect memory lower bound nor memory upper bound. | | |||
| `update-worker-count` | | 10 | kube-api-qps Number of concurrent workers to update VPA recommendations and checkpoints. When increasing this setting, make sure the client-side rate limits (kube-api-qps and `kube-api-burst`) are either increased or turned off as well. Determines the minimum number of VPA checkpoints written per recommender loop. | | |||
| `update-worker-count` | int | 10 | Number of concurrent workers to update VPA recommendations and checkpoints. When increasing this setting, make sure the client-side rate limits ('kube-api-qps' and 'kube-api-burst') are either increased or turned off as well. Determines the minimum number of VPA checkpoints written per recommender loop. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I happen to notice this weirdness in the --update-worker-count
due to the back-ticks, so I fixed them while I was here.
Reference: https://pkg.go.dev/flag#PrintDefaults
The listed type, here int, can be changed by placing a back-quoted name in the flag's usage string; the first such item in the message is taken to be a parameter name to show in the message and the back quotes are stripped from the message when displayed
oc.mutex.Lock() | ||
oc.cnt[key]++ | ||
oc.mutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at changing this to use atomic.AddInt64()
.
It works, but my concern is that the map need to gaurentee that all keys exist in the map. This is possible by pre-filling the map with all possible combinations. However, we need to remember to update this file each time a new type is added. Ie:
autoscaler/vertical-pod-autoscaler/pkg/utils/metrics/recommender/recommender.go
Lines 41 to 46 in 9bc4220
modes = []string{ | |
string(vpa_types.UpdateModeOff), | |
string(vpa_types.UpdateModeInitial), | |
string(vpa_types.UpdateModeRecreate), | |
string(vpa_types.UpdateModeAuto), | |
} |
The problem is that there doesn't seem to be a good way to dynamically generate that list in the VPA.
What we may be able to do is write a generate script that will keep this file up to date, which we can run in CI.
What type of PR is this?
/kind bug
What this PR does / why we need it:
#7992 introduces goroutines in order to process a batch of VPAs quicker. This inadvertently lead to some race conditions that we're picked up in our tests.
This PR attempts to fix those race conditions
Which issue(s) this PR fixes:
Fixes #8318
Special notes for your reviewer:
Opening this PR early, I still plan to write some tests. I'd like to make a breaking-test that these fixes solve.
/hold
Does this PR introduce a user-facing change?