diff --git a/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md b/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md new file mode 100644 index 00000000000..91c61116c43 --- /dev/null +++ b/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/README.md @@ -0,0 +1,879 @@ + +# KEP-5625: HPA - Improve pod selection accuracy across workload types + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +The Horizontal Pod Autoscaler (HPA) has a critical limitation in its pod selection mechanism: it collects metrics from all pods that match the target workload's label selector, regardless of whether those pods are actually managed by the target workload. This can lead to incorrect scaling decisions when unrelated pods (such as Jobs, CronJobs, or other Deployments) happen to share the same labels. + +This often results in unexpected behavior such as: + +* HPAs stuck at maxReplicas despite low actual usage in the target workload +* Unnecessary scaling events triggered by temporary workloads +* Unpredictable scaling behavior that's difficult to diagnose + +This proposal adds a parameter to HPAs which that ensures the HPA only considers pods that are actually owned by the target workload (through owner references), rather than all pods matching the label selector. + + +## Motivation + +Consider this example: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: test-app +spec: + replicas: 1 + selector: + matchLabels: + app: test-app + template: + metadata: + labels: + app: test-app + spec: + containers: + - name: nginx + image: nginx + resources: + requests: + cpu: 100m +--- +apiVersion: batch/v1 +kind: Job +metadata: + name: test-job +spec: + template: + metadata: + labels: + app: test-app # Same label as deployment + workload: scraper + spec: + containers: + - name: cpu-load + image: busybox + command: ["dd", "if=/dev/zero", "of=/dev/null"] + resources: + requests: + cpu: 100m + restartPolicy: Never +--- +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +metadata: + name: test-app-hpa +spec: + scaleTargetRef: + apiVersion: apps/v1 + kind: Deployment + name: test-app + minReplicas: 1 + maxReplicas: 5 + metrics: + - type: Resource + resource: + name: cpu + target: + type: Utilization + averageUtilization: 50 +``` +In this case, the HPA will factor in CPU consumption from the Job's pod despite it not being part of the Deployment, potentially causing incorrect scaling decisions. + + +### Goals + +* Improve the accuracy of HPA's pod selection to only include pods directly managed by the target workload +* Maintain backward compatibility with existing HPA configurations +* Provide clear visibility into which pods are being considered for scaling decisions +* Allow users to choose between selection strategies based on their needs + +### Non-Goals + +* Modifying how metrics are collected from pods +* Changing the scaling algorithm itself +* Addressing other HPA limitations not related to pod selection + +## Proposal + +We propose adding a new field to the HPA specification called `strictPodSelection` that allows users to specify how pods should be selected for metric collection: +* If set to true only pods that are actually owned by the target workload (through owner references) are being selected. +* If not set or set to false - default behavior. + +The default value will be Labels to maintain backward compatibility with existing HPAs. + +### Risks and Mitigations + +* Backward compatibility: Mitigated by making the new behavior opt-in with the current behavior as default. +* User confusion: We'll provide clear documentation on when and how to use each strategy. + +## Design Details + +The HPA specification (v2) will be extended with a new boolean field: +```yaml +apiVersion: autoscaling/v2 +kind: HorizontalPodAutoscaler +spec: + # Existing fields... + strictPodSelection: true # Default: false +``` +When the HPA controller processes an HPA resource: + +* If strictPodSelection is not specified or set to false: + * The controller will use the current behavior, selecting all pods that match the target workload's label selector + * This maintains backward compatibility with existing HPAs + +* If strictPodSelection is set to true: + * The controller will identify the target workload (e.g., Deployment) + * It will traverse the ownership chain (e.g., Deployment → ReplicaSet → Pods) + * Only pods that are directly owned by the target workload through this chain will be included in metric collection + * Pods that match labels but aren't in the ownership chain will be excluded + +Additionally, the HPA status will be enhanced to include information about the pod selection: +```yaml +status: + # Existing fields... + podSelectionInfo: + strategy: "Strict" # or "Label" when strictPodSelection is false +``` + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + +None required. + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + + + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +##### e2e tests + + + +We will add the following e2e tests: +- Test scaling with a deployment and unrelated job sharing labels (with and without strictPodSelection) +- Test scaling with multiple workload types that share label selectors + +[e2e autoscaling tests]: https://github.com/kubernetes/kubernetes/tree/master/test/e2e/autoscaling + +### Graduation Criteria + + + +#### Alpha + +- Feature implemented behind a feature gate (`HPAStrictPodSelection`) +- Initial e2e tests completed and enabled + +### Upgrade / Downgrade Strategy + +#### Upgrade +Existing HPAs will continue to work as they do today, using label-based pod selection regardless of pod ownership. Users can use the new feature by enabling the Feature Gate (alpha only) and setting the new strictPodSelection field to true on an HPA. + +#### Downgrade +On downgrade, all HPAs will revert to using label-based pod selection, regardless of any configured strictPodSelection value on the HPA itself. + +### Version Skew Strategy + +1. `kube-apiserver`: More recent instances will accept the new `strictPodSelection` field, while older instances will ignore it during validation and persist it as part of the HPA object. +2. `kube-controller-manager`: An older version could receive an HPA containing the new `strictPodSelection` field from a more recent API server, in which case it would ignore it (i.e., continue to use label-based pod selection regardless of the field's value). + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: HPAStrictPodSelection + - Components depending on the feature gate: `kube-controller-manager` and + `kube-apiserver`. + +###### Does enabling the feature change any default behavior? + + +No. By default, HPAs will continue to use label-based pod selection unless the new `strictPodSelection` field is explicitly set to true. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + +Yes. If the feature gate is disabled, all HPAs will revert to using label-based pod selection regardless of the value of the `strictPodSelection` field. + +###### What happens if we reenable the feature if it was previously rolled back? + +When the feature is re-enabled, any HPAs with `strictPodSelection: true` will resume using the strict ownership-based pod selection rather than label-based selection. The HPA controller will immediately begin considering only pods directly owned by the target workload for scaling decisions on these HPAs, potentially changing scaling behavior compared to when the feature was disabled. + +Existing HPAs that don't have `strictPodSelection` explicitly set will continue using label-based selection and won't be affected by re-enabling the feature. + +###### Are there any tests for feature enablement/disablement? + + + +We will add a unit test verifying that HPAs with and without the new `strictPodSelection` field are properly validated, both when the feature gate is enabled or not. This will ensure the HPA controller correctly applies the pod selection strategy based on the feature gate status and presence of the field. + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + +Rollout failures in this feature are unlikely to impact running workloads significantly, but there are edge cases to consider: +- If the feature is enabled during a high-traffic period, HPAs with `strictPodSelection: true` might suddenly change their scaling decisions based on the reduced pod set. This could cause unexpected scaling events. +- If a `kube-controller-manager` restarts mid-rollout, some HPAs might temporarily revert to label-based selection until the controller fully initializes with the new feature enabled. +These issues would only affect HPAs that have explicitly set `strictPodSelection: true`. Existing HPAs will continue to function with the default label-based pod selection behavior. + +###### What specific metrics should inform a rollback? + + +Operators should monitor these signals that might indicate problems: + +- Unexpected scaling events shortly after enabling the feature +- Significant changes in the number of replicas for workloads using HPAs with `strictPodSelection: true` +- Increased latency in the `horizontal_pod_autoscaler_controller_metric_computation_duration_seconds` metric +- Increased error rate in `horizontal_pod_autoscaler_controller_metric_computation_total` with error status +If these metrics show unusual patterns after enabling the feature, operators should consider rolling back. + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + +No. This feature only adds a new optional field to the HPA API and doesn't deprecate or remove any existing functionality. All current HPA behaviors remain unchanged unless users explicitly opt into the new selection mode. + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + +The presence of the `strictPodSelection: true` field in HPA specifications indicates that the feature is in use. Additionally, the HPA status will include information about the pod selection strategy in use through the `podSelectionInfo` field, which can be examined to determine if strict pod selection is active for a given HPA. + +###### How can someone using this feature know that it is working for their instance? + + + +- [x] API .status + - Other field: The HPA status will be enhanced to include pod selection information + ```yaml + status: + podSelectionInfo: + strategy: "Strict" # or "Label" when strictPodSelection is false + ``` +Additionally, verbose controller logs will show which pods were included or excluded from metric calculations due to the strict selection policy when troubleshooting is needed. + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + +N/A. + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + +This feature doesn't fundamentally change how the HPA controller operates; it refines which pods are included in metric calculations. Therefore, existing metrics for monitoring HPA controller health remain applicable. +Standard HPA metrics (e.g. +`horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`) can +be used to verify the HPA controller health. + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + +Not sure. + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + +No. + +###### Will enabling / using this feature result in introducing new API types? + + +No. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + +No. + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + +Yes. +- HorizontalPodAutoscaler objects will increase in size by approximately 1 byte for the boolean field when specified +- The status will include additional pod selection information (approximately 50-100 bytes) +- No additional API objects will be created + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + +No. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + +No. + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + +No. + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml b/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml new file mode 100644 index 00000000000..d5b40d6fb5f --- /dev/null +++ b/keps/sig-autoscaling/5325-hpa-pod-selection-accuracy/kep.yaml @@ -0,0 +1,47 @@ +title: KEP Template +kep-number: 5325 +authors: + - "@omerap12" + - "@adrianmoisey" +owning-sig: sig-autoscaling +participating-sigs: [] +status: provisional #|implementable|implemented|deferred|rejected|withdrawn|replaced +creation-date: 2025-05-21 +reviewers: + - TBD + - "@alice.doe" +approvers: + - TBD + - "@oscar.doe" + +see-also: [] +replaces: [] + +# The target maturity stage in the current dev cycle for this KEP. +# If the purpose of this KEP is to deprecate a user-visible feature +# and a Deprecated feature gates are added, they should be deprecated|disabled|removed. +stage: alpha #|beta|stable + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.34" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.34" + beta: "v1.35" + stable: "v1.36" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: HPAStrictPodSelection + components: + - kube-apiserver + - kube-controller-manager +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - my_feature_metric