docs: Document acceleratorName resolution on VariantAutoscaling#990
Open
jia-gao wants to merge 2 commits intollm-d:mainfrom
Open
docs: Document acceleratorName resolution on VariantAutoscaling#990jia-gao wants to merge 2 commits intollm-d:mainfrom
jia-gao wants to merge 2 commits intollm-d:mainfrom
Conversation
Document how WVA resolves the accelerator name for a VariantAutoscaling: first via auto-discovery from the target Deployment's nodeSelector / nodeAffinity (nvidia.com/gpu.product, amd.com/gpu.product-name, cloud.google.com/gke-accelerator), then falling back to the inference.optimization/acceleratorName label on the VA. This resolution logic was added in llm-d#882 but was never explicitly documented in user-facing docs. The net behavior is that a VA without either source silently skips metric emission — a common cause of "WVA is deployed but nothing scales". Changes: - docs/user-guide/configuration.md: New "Accelerator Name Resolution" subsection with the two-step lookup order, failure log signature, and a full example - docs/user-guide/troubleshooting.md: New section "Accelerator Name Cannot Be Resolved" with diagnostic kubectl commands for both the scale target and the VA, and resolution via either nodeSelector or the fallback label - deploy/README.md: Info callout on the manual VA creation example explaining the resolution order - config/samples/variantautoscaling-integration.yaml: Comment annotating the label as a fallback (not a hard requirement) Closes llm-d#855
ad59c6e to
8c40ed5
Compare
ev-shindin
requested changes
Apr 13, 2026
Collaborator
ev-shindin
left a comment
There was a problem hiding this comment.
Good documentation PR that fills a real gap — accelerator resolution was undocumented despite being a common source of silent failures. Please address review comments.
Changes based on review by @ev-shindin: 1. configuration.md: Correct "skip status updates" — WVA still sets MetricsAvailable=False condition; it only skips the full allocation and metric emission 2. troubleshooting.md: Same correction, plus add kubectl command to check the MetricsAvailable condition as the first diagnostic step 3. troubleshooting.md: Add nodeAffinity diagnostic command alongside nodeSelector (the code checks both requiredDuringScheduling and preferredDuringScheduling) 4. troubleshooting.md: Change "explicit override" to "fallback" for the VA label — it does not override auto-discovery
Contributor
Author
|
/ok-to-test |
Collaborator
|
/ok-to-test |
Contributor
|
🚀 Kind E2E (full) triggered by |
Contributor
|
🚀 OpenShift E2E — approve and run ( |
Contributor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Document how WVA resolves the accelerator name for a
VariantAutoscaling. Closes #855.Background
After #882 merged, WVA resolves the accelerator name in two steps:
nodeSelector/nodeAffinityfor GPU product keys:nvidia.com/gpu.productamd.com/gpu.product-namecloud.google.com/gke-acceleratorinference.optimization/acceleratorNamelabel on the VAIf both fail, WVA silently skips status updates and metric emission for the variant — HPA/KEDA never receives a scaling signal. The failure is silent at the API level (the VA is accepted), but the controller log contains
accelerator name not found in scale target nodeSelector/nodeAffinity or VA label.This resolution logic was added in #882 but was never explicitly documented in user-facing docs, so users who hit the failure mode had no way to diagnose or fix it without reading source. Thanks to @ev-shindin for the #882 context on the issue thread.
Changes
docs/user-guide/configuration.md— New "Accelerator Name Resolution" subsection under "Enabling Autoscaling for a Model Deployment". Explains the two-step lookup, the failure log signature, the Helm note, and includes a complete manual YAML example.docs/user-guide/troubleshooting.md— New section "Accelerator Name Cannot Be Resolved" under "Why is my VariantAutoscaling not being reconciled?" with:kubectlcommands to inspect the scale targetnodeSelectorkubectlcommand to inspect the VA fallback labeldeploy/README.md— Info callout on the existing manual VA creation example explaining the resolution order. Links to the new configuration.md section.config/samples/variantautoscaling-integration.yaml— Inline comment annotating the label as a fallback, not a hard requirement, with the auto-discovery keys spelled out.Out of scope
docs/user-guide/crd-reference.mdis generated from CRD types (CRD_OUTPUTin the Makefile) and cannot be edited directly.TODO: remove this checks when we will move to a new version of the CRD with required accelerator fieldatinternal/engines/saturation/engine.gostill applies as a future direction but is out of scope for this doc-only fix.Note on earlier framing
An earlier revision of this PR framed the label as required. That was accurate pre-#882 and matched the original issue description, but after #882 the label is a fallback — auto-discovery takes priority. All text has been rewritten to reflect the current behavior.