diff --git a/keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md b/keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md new file mode 100644 index 00000000000..8f9b28283a3 --- /dev/null +++ b/keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md @@ -0,0 +1,843 @@ + +# KEP-4951: Configurable tolerance for HPA + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Upgrade](#upgrade) + - [Downgrade](#downgrade) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +[Horizontal Pod Autoscaler][] (HPA) regularly estimates how many replicas a given Deployment (or other resource with a `/scale` subresource) should instantiate. +HPAs define one (or more) metrics (e.g. CPU utilization) on which autoscaling is based. The number of replicas is derived from the ratio between the *expected* and *current* value of this metric ([Algorithm details][]). + +For example, for a workload with 100 `currentReplicas` and a usage ratio +(`currentMetricValue`/`desiredMetricValue`) of 1.07, the calculated `desiredReplicas` +would be 107 (100 * 1.07). + +However, to avoid flapping, scaling actions are skipped if the usage ratio is approximately 1, within a +globally-configurable *tolerance*, set to 10% by default. In the example above, no scaling action would +take place, since the ratio is within this tolerance. + +This proposal adds a parameter to HPAs allowing users to configure this tolerance per HPA resource. +For the example above, we could configure the tolerance in the workload's HPA to 5%, which would +allow the scale-up to 107 replicas to proceed. + +[Horizontal Pod Autoscaler]: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ +[Algorithm details]: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details + +## Motivation + +Today the horizontal autoscaling tolerance is a cluster-wide parameter set using the [Kube Control Manager][] `--horizontal-pod-autoscaler-tolerance` parameter. It is by default set to 10%. While this value is often appropriate, it is considered too coarse grained in a number of scenario. + +This issue has been raised multiple times ([#116984][], [#125987][], [#62013][], +[#aks-3068][], [#keda-1100][]), with users commenting that: + +1. For large deployments, a 10% tolerance translates into very significant resources (i.e. hundreds of pods). +2. This tolerance can slow down scaling operations, hindering responsiveness in +case of surges. +3. Scale-ups are more a problem than scale-downs since typically pods are slower to initialize than to shut down, and since responding to load increase is typically more critical than freeing resources. + +Since appropriate tolerance values are workload-dependent, this KEP proposes to let users add custom tolerance values to `HorizontalPodAutoscaler` resources, overriding the existing default value when present. + +This solution integrates seamlessly with the existing HPA API since it already allows users to [fine-tune the autoscaler behavior](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#configurable-scaling-behavior). +The exact API recommended here has been previously proposed in [kep-853][] (see [here](https://github.com/kubernetes/enhancements/pull/1234#discussion_r333036990)), but it was then decided to implement it separately. + +[#116984]: https://github.com/kubernetes/kubernetes/issues/116984 +[#125987]: https://github.com/kubernetes/kubernetes/issues/125987 +[#62013]: https://github.com/kubernetes/kubernetes/issues/62013 +[#aks-3068]: https://github.com/Azure/AKS/issues/3068 +[#keda-1100]: https://github.com/kedacore/keda-docs/issues/1100 +[Kube Control Manager]: https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ +[kep-853]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/853-configurable-hpa-scale-velocity/README.md + +### Goals + +- Allow users to optionally override the default workload autoscaling tolerance on a per-HPA bases. + +### Non-Goals + +- Allow to customize the cluster-wise tolerance given by Kube Control Manager `--horizontal-pod-autoscaler-tolerance` parameter. + + +## Proposal + +We propose to add a new field to the existing [`HPAScalingRules`][] object: + +- `tolerance`: (float) the minimum change (from 1.0) in the desired-to-actual metrics ratio for the horizontal pod autoscaler to consider scaling. Must be greater than or equal to 0. + +The `tolerance` field is optional, and when not specified the HPA will continue to use the +value of the global `--horizontal-pod-autoscaler-tolerance` as the tolerance for scaling +calculations. + +Since there are separate `HPAScalingRules` objects defined for an HPA's +`spec.behavior.scaleUp` and `spec.behavior.scaleDown`, it is possible to specify different +`tolerance` values for scaling up vs. scaling down. + +[HPAScalingRules]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#hpascalingrules-v2-autoscaling + +### Risks and Mitigations + +There should be minimal risk introduced by the proposed changes: +- The new field is optional, and its absence results in no changes to the current autoscaling behavior +- When specified, the new value doesn't change the autoscaling algorithm used, but just overrides a single value used during the calculation. This value can already be changed via the `--horizontal-pod-autoscaler-tolerance` option of the `kube-controller-manager`. +- If a change to the new field results in undesirable behavior, the change can be reverted by deploying the previous version of the HPA resource, or removing the `tolerance` field entirely. + +## Design Details + +The `HorizontalPodAutoscaler` API is updated to add a new `tolerance` field to the `HPAScalingRules` object: + +```golang +type HPAScalingRules struct { + // tolerance is the tolerance on the ratio between the current and desired + // metric value under which no updates are made to the desired number of + // replicas. + // +optional + Tolerance *resource.Quantity + + // Existing fields. + StabilizationWindowSeconds *int32 + SelectPolicy *ScalingPolicySelect + Policies []HPAScalingPolicy +} +``` + +This new tolerance will be used in the autoscaling controller +[replica_calculator.go][]. The current logic is: + +```golang +if math.Abs(1.0-usageRatio) <= c.tolerance { /* ... */ } +``` + +It will be replaced by: + +```diff +- if math.Abs(1.0-usageRatio) <= c.tolerance { /* ... */ } ++ // Down and Up scaling tolerances default to c.tolerance if unset. ++ downTolerance, upTolerance := c.tolerance, c.tolerance ++ if scaleDown.tolerance != nil { ++ downTolerance = scaleDown.tolerance.AsApproximateFloat64() ++ } ++ if scaleUp.tolerance != nil { ++ upTolerance = scaleUp.tolerance.AsApproximateFloat64() ++ } ++ ++ if (1.0-downTolerance) <= usageRatio && usageRatio <= (1.0+upTolerance) { /* ... */ } +``` + +Since the added field is optional and it's omission results in no change to the existing +autoscaling behavior, this feature can be added to the current API +version `pkg/apis/autoscaling/v2`. + +The feature presented in this KEP only allows users to tune an existing parameter, and +as such doesn't require any new HPA Events or modify any Status. The validation logic +will be updated to ensure that the `tolerance` field cannot be set to a negative value. + +[replica_calculator.go]: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/replica_calculator.go + +### Test Plan + + + +[ ] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- `/apis/autoscaling/validation`: `2024-11-13` - `95.6` +- `/pkg/controller/podautoscaler`: `2024-11-13` - `96.4` + +##### Integration tests + + + + + +- : + +##### e2e tests + + + +We will add the follow [e2e autoscaling tests]: + +- For both scale up and scale down: + - Workload does not scale because the metric ratio is in tolerance. + - Workload scales successfully because the metric ratio is out of tolerance. +- Autoscaling uses the default when no tolerances are set. + +[e2e autoscaling tests]: https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling + +### Graduation Criteria + + + +#### Alpha + +- Feature implemented behind a `HPAConfigurableTolerance` feature flag +- Initial e2e tests completed and enabled + +### Upgrade / Downgrade Strategy + +#### Upgrade +Existing HPAs will continue to work as they do today, using the global `horizontal-pod-autoscaler-tolerance` +value from the `kube-controller-manager`. Users can use the new feature by enabling the Feature +Gate (alpha only) and setting the new `tolerance` field on an HPA. + +#### Downgrade +On downgrade, all HPAs will revert to using the global `horizontal-pod-autoscaler-tolerance` +value from the `kube-controller-manager`, regardless of any configured `tolerance` value on the HPA +itself. + +### Version Skew Strategy + +1. `kube-apiserver`: More recent instances will accept the new 'tolerance' + field, while older will ignore it. +2. `kube-controller-manager`: An older version could receive an HPA containing + the new `tolerance` field from a more recent API server, in which case it + would ignore it (i.e. scale as if it was not present). + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: HPAConfigurableTolerance + - Components depending on the feature gate: `kube-controller-manager` + +###### Does enabling the feature change any default behavior? + + + +No. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +The feature can be disabled by restarting the `kube-controller-manager` with the feature gate set to `false`. + +Any `tolerance` values set on existing HPAs will be ignored by the `kube-controller-manager` when the feature gate is off. + +###### What happens if we reenable the feature if it was previously rolled back? + +When the feature is re-enabled, any HPAs with configured `tolerance` values will use those when calculating replica counts, rather than the global tolerance from the `kube-controller-manager`. + +###### Are there any tests for feature enablement/disablement? + + + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +No. + +###### Will enabling / using this feature result in introducing new API types? + + + +No. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +No. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +- This feature adds two new optional integer fields to `HorizontalPodAutoscaler` + `v2` objects. Users should expect this object to increase in size (5 bytes) + each time they set this new field. + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +No. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +No. + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + +- 2024-11-XX Provisional KEP merged +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml b/keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml new file mode 100644 index 00000000000..9b33ce92193 --- /dev/null +++ b/keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml @@ -0,0 +1,44 @@ +title: Configurable tolerance for HPA +kep-number: 4951 +authors: + - "@pr00se" + - "@jm-franc" +owning-sig: sig-autoscaling +status: provisional +creation-date: 2024-11-05 +reviewers: + - "@gjtempleton" + - "@raywainman" +approvers: + - TBD + +see-also: + - "/keps/sig-autoscaling/853-configurable-hpa-scale-velocity" +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: TBD + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: TBD + beta: TBD + stable: TBD + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: HPAConfigurableTolerance + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +#metrics: +# - my_feature_metric