-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check the scalability of one kind prior to get its scale subresource from apiserver #6431
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yunwang0911 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kgolab @kwiesmueller @jbartosik @voelzmo @krzysied Can anyone review this PR kindly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I have one direct comment about the code changes, where I'm not sure it can actually work this way.
Apart from that single comment: Could you help me understand a bit more what the original problem that you're trying to solve is? Which calls exactly are you trying to avoid and which scenario do they happen?
We do already have a cache implementation for the scale subresource per resource, so that's probably not it?
In your PR description you write
It works ok in most of the time, but it will lead to client-go limitation when the top most controller isn't well known and isn't scalable (...)
The topmost resource will need to be either well-known or scalable, otherwise the VPA will throw an error about misconfiguration. A similar error happens if the rousece in the targetRef
is not scalable or well-known.
Maybe you can offer a concrete example where you see this happening, to help me understand better what this PR is solving. Thanks!
@@ -206,6 +209,12 @@ func (f *controllerFetcher) getParentOfController(controllerKey ControllerKeyWit | |||
return getParentOfWellKnownController(informer, controllerKey) | |||
} | |||
|
|||
// check if it's scalable | |||
scalable := f.isScalable(controllerKey.ApiVersion, controllerKey.Kind) | |||
if !scalable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this early exit actually works?
A Pod might have an owner reference that is neither well known nor scalable. As long as the topmost owner (the end of the "ownership chain") is scalable, this should be sufficient. With this change, we would return nil
, as if this controller didn't have an owner, which doesn't seem correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the topmost owner (the end of the "ownership chain") isn't scalable? Even the spec.TargetRef is scalable, this VPA still won't be synced
For example, there is one CRD, called apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
ownerReferences:
- apiVersion: apps.xx.xx/v1
controller: true
kind: GlobalDeployment
name: nginx
uid: xxx
spec:
... And the targetRef of VPA is one deployment. |
Thanks for adding additional context, @yunwang0911! I think I understand your use-case and what you're trying to improve here. I'm curious if you saw a decrease in calls made to the kube-apiserver after running a vpa-recommender with these changes? Looking at the code paths that your PR is touching, leads me to believe that you're trying to improve the flow in In Getting the REST mappings uses the very same CachedDiscoveryClient that you're using in your PR, and checking the My thinking is that given the above mentioned amount of caching on that code path, this will probably not reduce the calls made to the kube-apiserver by much – except when you avoid that the item gets added to the cache and therefore doesn't get refreshed. Is my understanding of your concerns correct until this point and you're mainly concerned with the 10 minute cache refresh interval? If we want to change this, we can think about two different things:
In both cases, I'm still kind of concerned with the impact on scenarios where getting the WDYT? |
groupResource := mapping.Resource.GroupResource() | ||
scale, err := f.getScaleForResource(key.Namespace, groupResource, key.Name) | ||
if err == nil && scale != nil { | ||
expectedScaleResource := fmt.Sprintf("%ss/%s", strings.ToLower(kind), "scale") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes that the plural
form of a resource is always in the form of singular
+s. In many cases, this is true, but not all the time. See e.g. the prometheus CRD, where the plural is "prometheuses". It could also have been "prometheis". In short: I don't think it is possible to take this shortcut here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes that the plural form of a resource is always in the form of
singular
+s. In many cases, this is true, but not all the time.
Yes, you are right. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@voelzmo I tried to get the plural name from apiresource. Please take a look again
Yes, the calls made to the apiserver indeed decrease.
Yes, I'm trying to improve the function
func (f *controllerFetcher) getScaleForResource(namespace string, groupResource schema.GroupResource, name string) (controller *autoscalingapi.Scale, err error) {
if ok, scale, err := f.scaleSubresourceCacheStorage.Get(namespace, groupResource, name); ok {
return scale, err
}
scale, err := f.scaleNamespacer.Scales(namespace).Get(context.TODO(), groupResource, name, metav1.GetOptions{})
f.scaleSubresourceCacheStorage.Insert(namespace, groupResource, name, scale, err)
return scale, err
} The
You see, for every this kind of object, it's required to call apiserver every time until the vpa is deleted.
As the function below shows, the function // isScalable returns true if the given controller is scalable.
// isScalable checks if the controller is scalable by checking if the resource "<key>/scale" exists in the cached discovery client.
func (f *controllerFetcher) isScalable(apiVersion string, kind string) bool {
resourceList, err := f.cachedDiscoveryClient.ServerResourcesForGroupVersion(apiVersion)
if err != nil {
klog.Errorf("Could not find resources for %s: %v", apiVersion, err)
return false
}
expectedScaleResource := fmt.Sprintf("%ss/%s", strings.ToLower(kind), "scale")
for _, r := range resourceList.APIResources {
if r.Name == expectedScaleResource {
return true
}
}
return false
} |
|
||
for _, mapping := range mappings { | ||
groupResource := mapping.Resource.GroupResource() | ||
scale, err := f.getScaleForResource(key.Namespace, groupResource, key.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if one reason we need to do this lookup to API Server is for RBAC purposes (as @voelzmo mentions a bit earlier)... Do you happen to know what the impact would be? (I'm digging around a bit myself but thought I'd throw it out there).
If VPA doesn't have the correct permissions on the object (aka /scale
+ get
), this should fail? With the new discovery-based lookup, this would pass instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If we want to check RBAC, we should use SubjectAccessReview. Besides, even it's passed here, getParentOfController->getOwnerForScaleResource->getScaleForResource
the call chain will reach function getScaleForResource
ultimately. What do you think?
groupResource := mapping.Resource.GroupResource() | ||
scale, err := f.getScaleForResource(key.Namespace, groupResource, key.Name) | ||
if err == nil && scale != nil { | ||
expectedScaleResource := fmt.Sprintf("%s/%s", plural, "scale") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Assuming the RBAC issues mentioned earlier are OK)
Could we use the ScaleKindResolver
?
If you look at the implementation, it is effectively doing what you are doing here by looping over the DiscoveryClient resources. Using that directly would cut down the amount of code needed here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll replace it by ScaleKindResolver. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstand the function of ScaleKindResolver
. Actually, it's not work as I expected. Let me elaborate the difference between current implement, ScaleKindResolver
and the implement in this PR.
bad38ce
to
e5419e8
Compare
@voelzmo @raywainman Sorry for my unclear explanation.
One line to describe the difference: the former two methods check if the scale subresource resource of a given resource exists, while this PR checks if the Scale APIResource of the kind of a given resource exists. Method 1: current implement - scaleSubresourceCacheStoragecall chain
At the fifth step, it tries to get from scaleSubresourceCacheStorage, if succeeds, that's good, but if fails it will get from apiserver. scale, err := f.scaleNamespacer.Scales(namespace).Get(context.TODO(), groupResource, name, metav1.GetOptions{}) And it tries to get scale of resource namespace/name from apiserver, which means the times it calls apiservice might be equal to the number of the resources. If the resource is scalable, such as deployment, statefulset, it's ok, because next time it is able to get scale object from
If there are a lot of DeploymentGroup(name is important here), and VPAs references Deployment, the recommender will break. |
@yunwang0911 Thank you for the really thorough explanation, let me take a bit of time to look through this and get back to you (should have time tomorrow during the day EDT timezone). |
Thanks for the nice picture, @yunwang0911! I still think that the cause for the many requests to the kube-apiserver is not the checks for the
Instead, the issue is that we're adding the negative result (not found error) to the We have also logic using the That being said, I think the way how the cache is designed doesn't really fit for what it tries to accomplish. I'd rather try to change how/what we're caching instead of working around this.
Sorry for the long comment, I'm trying to express that I do understand that you're trying to solve an important problem I don't think we should be putting another bandaid on the concept here. I might be misunderstanding how all of this works, maybe it is worthwhile trying to ping @jbartosik to get some historical context here? |
@voelzmo Yeah, func (f *controllerFetcher) getOwnerForScaleResource(groupKind schema.GroupKind, namespace, name string) (*ControllerKeyWithAPIVersion, error) {
...
mappings, err := f.mapper.RESTMappings(groupKind)
if err != nil {
return nil, err
}
var lastError error
for _, mapping := range mappings {
// only use RESTMapper to get the resource of the kind, not check if /scale is supported in this kind.
groupResource := mapping.Resource.GroupResource()
scale, err := f.getScaleForResource(namespace, groupResource, name)
if err == nil {
return getOwnerController(scale.OwnerReferences, namespace), nil
}
lastError = err
}
return nil, lastError
}
func (f *controllerFetcher) getScaleForResource(namespace string, groupResource schema.GroupResource, name string) (controller *autoscalingapi.Scale, err error) {
// check cache
if ok, scale, err := f.scaleSubresourceCacheStorage.Get(namespace, groupResource, name); ok {
return scale, err
}
// if not exist in cache, call apiserver to get the scale of the kind, right?
scale, err := f.scaleNamespacer.Scales(namespace).Get(context.TODO(), groupResource, name, metav1.GetOptions{})
f.scaleSubresourceCacheStorage.Insert(namespace, groupResource, name, scale, err)
return scale, err
} Per the code, the RESTMapper is only used to convert GVK to GVR. And it checks scalability by two methods, 1. checking cache, 2. call apiserver to get /scale. The flow chart is as follow However, we should check if the kind supports /scale before getting /scale from apiserver. As we all know, if one kind doesn't support /scale, it always fails when it gets /scale from apiserver. The logic is as follow |
Going along with @voelzmo's suggestion above to perhaps flip this and look at the role of the cache in this... Could we change the way the cache is keying entries? Instead of adding the resource into the cache, we can change it to only key by kind? If it's not possible to change this cache (I haven't looked at all the users of this cache but looks like @voelzmo did do an analysis as well), maybe we think about creating a new cache with this new key schema and using it here? Using this new "per-kind" cache, you are then only ever going once to API server per kind per 10 minutes. Wdyt? |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
@raywainman what about the ideas regarding changing how the cache works? Should we make an issue out of this and improve the amount of calls done?
Maybe we can also ping @jbartosik for this one, given that he was driving the cache development back in the days? |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@yunwang0911: Failed to re-open PR: state cannot be changed. The perf branch was force-pushed or recreated. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@yunwang0911: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@voelzmo @raywainman I reopened this PR since I think this PR is really helpful. Besides, I saw your discussion in the PR #7517. I agreed that "we don't need to make the calls checking /scale for every object of a kind, it is fine to do it per kind or GroupResource." And this PR is for this purpose |
What type of PR is this?
/kind bug
What this PR does / why we need it:
In the VPA recommender, the controller fetcher gets the top most well known or scalable controller using the function
FindTopMostWellKnownOrScalable
. It works ok in most of the time, but it will lead to client-go limitation when the top most controller isn't well known and isn't scalable, of course, the vpa number is large, like 3k+. Since it tried to get the top most controller's scale subresource for every top most object. This PR checks the top most controller's scale subresource by finding the resource name<pluralObject>/scale
in the cached discovery client. Of course, if the top most controller has no scale subresource, it fails, and return nil directly. All the process is handled in memory, no need to query kube-apiserverWhich issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
No user-facing change.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: