Skip to content

docs: Document acceleratorName resolution on VariantAutoscaling#990

Open
jia-gao wants to merge 2 commits intollm-d:mainfrom
jia-gao:docs/accelerator-label-855
Open

docs: Document acceleratorName resolution on VariantAutoscaling#990
jia-gao wants to merge 2 commits intollm-d:mainfrom
jia-gao:docs/accelerator-label-855

Conversation

@jia-gao
Copy link
Copy Markdown
Contributor

@jia-gao jia-gao commented Apr 12, 2026

Summary

Document how WVA resolves the accelerator name for a VariantAutoscaling. Closes #855.

Background

After #882 merged, WVA resolves the accelerator name in two steps:

  1. Auto-discovery from the target Deployment/LWS pod template by reading nodeSelector / nodeAffinity for GPU product keys:
    • nvidia.com/gpu.product
    • amd.com/gpu.product-name
    • cloud.google.com/gke-accelerator
  2. Fallback to the inference.optimization/acceleratorName label on the VA

If both fail, WVA silently skips status updates and metric emission for the variant — HPA/KEDA never receives a scaling signal. The failure is silent at the API level (the VA is accepted), but the controller log contains accelerator name not found in scale target nodeSelector/nodeAffinity or VA label.

This resolution logic was added in #882 but was never explicitly documented in user-facing docs, so users who hit the failure mode had no way to diagnose or fix it without reading source. Thanks to @ev-shindin for the #882 context on the issue thread.

Changes

  • docs/user-guide/configuration.md — New "Accelerator Name Resolution" subsection under "Enabling Autoscaling for a Model Deployment". Explains the two-step lookup, the failure log signature, the Helm note, and includes a complete manual YAML example.

  • docs/user-guide/troubleshooting.md — New section "Accelerator Name Cannot Be Resolved" under "Why is my VariantAutoscaling not being reconciled?" with:

    • kubectl commands to inspect the scale target nodeSelector
    • kubectl command to inspect the VA fallback label
    • Two log signatures users will see
    • Resolution via either pinning GPU type on the Deployment or setting the label
  • deploy/README.md — Info callout on the existing manual VA creation example explaining the resolution order. Links to the new configuration.md section.

  • config/samples/variantautoscaling-integration.yaml — Inline comment annotating the label as a fallback, not a hard requirement, with the auto-discovery keys spelled out.

Out of scope

  • docs/user-guide/crd-reference.md is generated from CRD types (CRD_OUTPUT in the Makefile) and cannot be edited directly.
  • The existing TODO: remove this checks when we will move to a new version of the CRD with required accelerator field at internal/engines/saturation/engine.go still applies as a future direction but is out of scope for this doc-only fix.

Note on earlier framing

An earlier revision of this PR framed the label as required. That was accurate pre-#882 and matched the original issue description, but after #882 the label is a fallback — auto-discovery takes priority. All text has been rewritten to reflect the current behavior.

Document how WVA resolves the accelerator name for a VariantAutoscaling:
first via auto-discovery from the target Deployment's nodeSelector /
nodeAffinity (nvidia.com/gpu.product, amd.com/gpu.product-name,
cloud.google.com/gke-accelerator), then falling back to the
inference.optimization/acceleratorName label on the VA.

This resolution logic was added in llm-d#882 but was never explicitly
documented in user-facing docs. The net behavior is that a VA without
either source silently skips metric emission — a common cause of
"WVA is deployed but nothing scales".

Changes:
- docs/user-guide/configuration.md: New "Accelerator Name Resolution"
  subsection with the two-step lookup order, failure log signature,
  and a full example
- docs/user-guide/troubleshooting.md: New section "Accelerator Name
  Cannot Be Resolved" with diagnostic kubectl commands for both the
  scale target and the VA, and resolution via either nodeSelector or
  the fallback label
- deploy/README.md: Info callout on the manual VA creation example
  explaining the resolution order
- config/samples/variantautoscaling-integration.yaml: Comment
  annotating the label as a fallback (not a hard requirement)

Closes llm-d#855
@jia-gao jia-gao force-pushed the docs/accelerator-label-855 branch from ad59c6e to 8c40ed5 Compare April 12, 2026 00:25
@jia-gao jia-gao changed the title docs: Document required acceleratorName label on VariantAutoscaling docs: Document acceleratorName resolution on VariantAutoscaling Apr 12, 2026
Copy link
Copy Markdown
Collaborator

@ev-shindin ev-shindin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good documentation PR that fills a real gap — accelerator resolution was undocumented despite being a common source of silent failures. Please address review comments.

Comment thread docs/user-guide/troubleshooting.md
Comment thread docs/user-guide/troubleshooting.md
Comment thread docs/user-guide/troubleshooting.md Outdated
Comment thread docs/user-guide/configuration.md Outdated
Changes based on review by @ev-shindin:

1. configuration.md: Correct "skip status updates" — WVA still sets
   MetricsAvailable=False condition; it only skips the full allocation
   and metric emission
2. troubleshooting.md: Same correction, plus add kubectl command to
   check the MetricsAvailable condition as the first diagnostic step
3. troubleshooting.md: Add nodeAffinity diagnostic command alongside
   nodeSelector (the code checks both requiredDuringScheduling and
   preferredDuringScheduling)
4. troubleshooting.md: Change "explicit override" to "fallback" for
   the VA label — it does not override auto-discovery
@jia-gao
Copy link
Copy Markdown
Contributor Author

jia-gao commented Apr 15, 2026

/ok-to-test

Copy link
Copy Markdown
Collaborator

@ev-shindin ev-shindin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks!

@ev-shindin
Copy link
Copy Markdown
Collaborator

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 45 5
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

acceleratorName label requirement on VariantAutoscaling is not documented

2 participants