docs: Document acceleratorName resolution on VariantAutoscaling by jia-gao · Pull Request #990 · llm-d/llm-d-workload-variant-autoscaler

jia-gao · 2026-04-12T00:20:21Z

Summary

Document how WVA resolves the accelerator name for a VariantAutoscaling. Closes #855.

Background

After #882 merged, WVA resolves the accelerator name in two steps:

Auto-discovery from the target Deployment/LWS pod template by reading nodeSelector / nodeAffinity for GPU product keys:
- nvidia.com/gpu.product
- amd.com/gpu.product-name
- cloud.google.com/gke-accelerator
Fallback to the inference.optimization/acceleratorName label on the VA

If both fail, WVA silently skips status updates and metric emission for the variant — HPA/KEDA never receives a scaling signal. The failure is silent at the API level (the VA is accepted), but the controller log contains accelerator name not found in scale target nodeSelector/nodeAffinity or VA label.

This resolution logic was added in #882 but was never explicitly documented in user-facing docs, so users who hit the failure mode had no way to diagnose or fix it without reading source. Thanks to @ev-shindin for the #882 context on the issue thread.

Changes

docs/user-guide/configuration.md — New "Accelerator Name Resolution" subsection under "Enabling Autoscaling for a Model Deployment". Explains the two-step lookup, the failure log signature, the Helm note, and includes a complete manual YAML example.
docs/user-guide/troubleshooting.md — New section "Accelerator Name Cannot Be Resolved" under "Why is my VariantAutoscaling not being reconciled?" with:
- kubectl commands to inspect the scale target nodeSelector
- kubectl command to inspect the VA fallback label
- Two log signatures users will see
- Resolution via either pinning GPU type on the Deployment or setting the label
deploy/README.md — Info callout on the existing manual VA creation example explaining the resolution order. Links to the new configuration.md section.
config/samples/variantautoscaling-integration.yaml — Inline comment annotating the label as a fallback, not a hard requirement, with the auto-discovery keys spelled out.

Out of scope

docs/user-guide/crd-reference.md is generated from CRD types (CRD_OUTPUT in the Makefile) and cannot be edited directly.
The existing TODO: remove this checks when we will move to a new version of the CRD with required accelerator field at internal/engines/saturation/engine.go still applies as a future direction but is out of scope for this doc-only fix.

Note on earlier framing

An earlier revision of this PR framed the label as required. That was accurate pre-#882 and matched the original issue description, but after #882 the label is a fallback — auto-discovery takes priority. All text has been rewritten to reflect the current behavior.

Document how WVA resolves the accelerator name for a VariantAutoscaling: first via auto-discovery from the target Deployment's nodeSelector / nodeAffinity (nvidia.com/gpu.product, amd.com/gpu.product-name, cloud.google.com/gke-accelerator), then falling back to the inference.optimization/acceleratorName label on the VA. This resolution logic was added in llm-d#882 but was never explicitly documented in user-facing docs. The net behavior is that a VA without either source silently skips metric emission — a common cause of "WVA is deployed but nothing scales". Changes: - docs/user-guide/configuration.md: New "Accelerator Name Resolution" subsection with the two-step lookup order, failure log signature, and a full example - docs/user-guide/troubleshooting.md: New section "Accelerator Name Cannot Be Resolved" with diagnostic kubectl commands for both the scale target and the VA, and resolution via either nodeSelector or the fallback label - deploy/README.md: Info callout on the manual VA creation example explaining the resolution order - config/samples/variantautoscaling-integration.yaml: Comment annotating the label as a fallback (not a hard requirement) Closes llm-d#855

ev-shindin

Good documentation PR that fills a real gap — accelerator resolution was undocumented despite being a common source of silent failures. Please address review comments.

@ev-shindin

Changes based on review by @ev-shindin: 1. configuration.md: Correct "skip status updates" — WVA still sets MetricsAvailable=False condition; it only skips the full allocation and metric emission 2. troubleshooting.md: Same correction, plus add kubectl command to check the MetricsAvailable condition as the first diagnostic step 3. troubleshooting.md: Add nodeAffinity diagnostic command alongside nodeSelector (the code checks both requiredDuringScheduling and preferredDuringScheduling) 4. troubleshooting.md: Change "explicit override" to "fallback" for the VA label — it does not override auto-discovery

jia-gao · 2026-04-15T00:58:06Z

/ok-to-test

ev-shindin

Great work! Thanks!

ev-shindin · 2026-04-15T09:31:18Z

/ok-to-test

github-actions · 2026-04-15T09:31:29Z

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

github-actions · 2026-04-15T09:31:37Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-04-15T14:42:11Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	45	5

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

jia-gao force-pushed the docs/accelerator-label-855 branch from ad59c6e to 8c40ed5 Compare April 12, 2026 00:25

jia-gao changed the title ~~docs: Document required acceleratorName label on VariantAutoscaling~~ docs: Document acceleratorName resolution on VariantAutoscaling Apr 12, 2026

ev-shindin requested changes Apr 13, 2026

View reviewed changes

Comment thread docs/user-guide/troubleshooting.md

Comment thread docs/user-guide/troubleshooting.md

Comment thread docs/user-guide/troubleshooting.md Outdated

Comment thread docs/user-guide/configuration.md Outdated

ev-shindin approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Document acceleratorName resolution on VariantAutoscaling#990

docs: Document acceleratorName resolution on VariantAutoscaling#990
jia-gao wants to merge 2 commits intollm-d:mainfrom
jia-gao:docs/accelerator-label-855

jia-gao commented Apr 12, 2026 •

edited

Loading

Uh oh!

ev-shindin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jia-gao commented Apr 15, 2026

Uh oh!

ev-shindin left a comment

Uh oh!

ev-shindin commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jia-gao commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Out of scope

Note on earlier framing

Uh oh!

ev-shindin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jia-gao commented Apr 15, 2026

Uh oh!

ev-shindin left a comment

Choose a reason for hiding this comment

Uh oh!

ev-shindin commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

GPU Pre-flight Check ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jia-gao commented Apr 12, 2026 •

edited

Loading