Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions. by shuynh2017 · Pull Request #876 · llm-d/llm-d-workload-variant-autoscaler

shuynh2017 · 2026-03-11T15:10:59Z

Epic: #872

The issues:

The installation procedure as documented in many places are not consitent and without clarity such as whether the installation is for infrastructure, WVA controller or variant.
There are repeating disclaimers about what being installed and not being installed.
Installation instructions are in many places, not consistent, and can only get more inconsistent.
There are other issues with documentation; howver, we will just keep the scope of this PR to be managable.

Architecturely, the installation procedure needs to be splitted clearly into 3 parts: infrastructure, WVA controller, WVA variant.

This PR addresses the above issues by:

Introducing the official installation produre to be 3 parts as mentioned, and documenting in one single place (instead of many places): charts/workload-variant-autoscaler/README.md.
Updating all places with installation proceure to point to the above single source README.md, and point to appropriate section (infrastructure, controller, variant).
Remove redundant, unnecessary installation options such as minimal installation, developer installation, etc ...

As the result:

If one seaches for running install.sh in the .md files, there should only one place where it's mentioned which is the above README.md.
If one searches for helm install, helm upgrade for controller or variant, there should also be one place where it's documented which is above README.md.

Summary: There should be no changes due to this PR as it only clarifies, consolidates, and prepares for the next step in cleaning the install.sh script itself. As such, for any changes in install.sh (such as reducing it to only infrastructure and no more), we will only have to update the document in one place.

shuynh2017 · 2026-03-11T15:11:25Z

@lionelvillard

lionelvillard · 2026-03-11T15:17:45Z

-  -f config/samples/prometheus-adapter-values-ocp.yaml
+export HF_TOKEN="hf_xxxxx"
+cd $WVA_PROJECT
+./deploy/install.sh  # SUMH: can we move it to a new 'scripts' folder and rename to install-infra.sh?


I wonder if llm-d provide such script already. We aren't the only project to install llm-d, aren't we?

Yep, we have a function deploy_llm_d_infrastructure that downloads llmd, runs couple of scripts from llmd, and does a number of other things.

shuynh2017 · 2026-03-13T21:38:44Z

@lionelvillard, this is ready for review. Sorry for the many changes but I think we need to make these changes in one step. Thanks.

asm582 · 2026-03-16T14:40:56Z

Do we know what is pending in this PR? Can we squash commits?

asm582 · 2026-03-16T14:42:01Z


+```bash
+export HF_TOKEN="hf_xxxxx"
+export ENVIRONMENT="kind-emulator" # or "openshift", "kubernetes"


nit: I vote to drop Kubernetes.

ok, once we decide, I can remove its support from the installation to reduce the code (in the following PR)

Agree with @asm582. This script is only used for e2e tests and we currently only support kind and openshift

ok, I will remove from README.md now. We can remove related code from install.sh later.

asm582

Added nit

shuynh2017 · 2026-03-17T14:22:22Z

@lionelvillard pls review. We can review off-line as well. Thanks.

Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>

lionelvillard · 2026-03-23T13:51:53Z

/ok-to-test

lionelvillard · 2026-03-23T13:52:00Z

/trigger-e2e-full

github-actions · 2026-03-23T13:52:06Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-23T13:52:12Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

lionelvillard · 2026-03-23T13:55:09Z


-Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads
+## Installation Overview
+Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed  as scaling targets for model deployments in llm-d namespaces. 


infrastructure installation is out-of-scope. Instead add a prerequisite section including llm-d infrastructure as a required prerequisite.

@lionelvillard, in the pre-req section, do we want to mention using install.sh or just mention the components need to be there? If we don't mention install.sh then we are on a good path to reduce it to just for developer (kind install).

install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.

github-actions · 2026-03-23T13:55:57Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	28	22

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

lionelvillard · 2026-03-23T14:00:54Z

-settings, manually append the values in `config/samples/prometheus-adapter-values-ocp.yaml`
-then run helm upgrade with the appended values. Here's an example how to get the current
-values: `kubectl get configmap prometheus-adapter -n $MON_NS -o yaml`
+### Infrastructure Installation


as discussed above this is out-of-scope.

lionelvillard · 2026-03-23T14:04:06Z

-    control-plane: controller-manager
-    openshift.io/user-monitoring: "true"
-EOF
+### WVA Controller Installation


I would like to document both installation modes, cluster-wide and namespace-wide installation. At least in this PR, can you mention what installation mode the instructions below are covering?

I will add a short sentence in this section to high-light this option.

lionelvillard · 2026-03-23T14:05:36Z

-### Step 4: Add Models as Scale Targets To WVA Controller
+For more configurable parameters for WVA controller see $WVA_PROJECT/charts/workload-variant-autoscaler/values.yaml
+
+### WVA Variant Installation


Eventually this section will go away. Managing VA is only supported using yaml manifests (with kustomize or not).

ok. I will leave it with helm for now.

lionelvillard · 2026-03-23T14:21:17Z

-# Run deployment script
-bash install.sh
-```
+#### Installation


I would remove this section as 1. the helm chart doc points back to this document 2. the helm chart doc should not document infra installation.

ok, will remove this section. Stuff under deploy directory can use another pass to clean up.

mamy-CS · 2026-03-24T18:30:27Z

/trigger-e2e-full

github-actions · 2026-03-24T18:30:36Z

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

mamy-CS · 2026-03-24T18:30:48Z

/ok-to-test

github-actions · 2026-03-24T18:31:10Z

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

github-actions · 2026-03-24T18:33:47Z

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource	Total	Allocated	Available
GPUs	50	24	26

Cluster	Value
Nodes	16 (7 with GPUs)
Total CPU	993 cores
Total Memory	10383 Gi
GPUs required	4 (min) / 6 (recommended)

mamy-CS · 2026-03-24T18:42:14Z


-Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads
+## Installation Overview
+Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed  as scaling targets for model deployments in llm-d namespaces. 


install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.

mamy-CS · 2026-03-24T18:44:06Z

-kubectl get secret thanos-querier-tls -n openshift-monitoring -o jsonpath='{.data.tls\.crt}' | base64 -d > /tmp/prometheus-ca.crt
+### Download and Setup Variables
+```bash
+export WVA_RELEASE="v0.5.1" # select a release from https://github.com/llm-d/llm-d-workload-variant-autoscaler


maybe keep the release generic, instead of pinning one to avoid users pinning that release.

mamy-CS · 2026-03-25T14:22:42Z

@shuynh2017 have one pass at this, and we can merge it to keep things rolling. Thanks.

Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com> Signed-off-by: Lionel Villard <villard@us.ibm.com>

lionelvillard · 2026-03-26T13:37:11Z


-git clone -b $WVA_RELEASE -- https://github.com/$OWNER/$WVA_PROJECT.git $WVA_PROJECT
-cd $WVA_PROJECT
+git clone -b $WVA_RELEASE -- https://github.com/llm-d/llm-d-workload-variant-autoscaler.git llm-d-workload-variant-autoscaler


there is no need to clone the repository. Instead, these instructions should point to the chart registry.

lionelvillard · 2026-03-26T13:41:43Z


 cd $WVA_PROJECT/charts
-helm upgrade -i workload-variant-autoscaler ./workload-variant-autoscaler \
+helm install workload-variant-autoscaler ./workload-variant-autoscaler \


this documentation should not assume the WVA repository has been cloned. All helm commands should use published helm charts.

lionelvillard · 2026-03-26T13:42:33Z

+### WVA Variant Installation
 After a WVA controller has been installed,
-you can add one or more models running in LLMD namespaces as scale targets to the WVA controller. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` LLMD namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace.
+you can add one or more models running in llm-d namespaces as scale targets to the WVA controller by creating WVA variants. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` llm-d namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace. Again, please note the required values for some of these parameters.


nit: you -> You.

lionelvillard · 2026-03-26T13:46:47Z

 ### Infra-Only Setup (Required Before Running Tests)

-Tests expect **only** the WVA controller and llm-d infrastructure to be deployed; they create VariantAutoscaling resources, HPAs, and model services themselves. Use the install script in **infra-only** mode:
+Tests expect **only** the WVA controller and llm-d infrastructure to be deployed. The tests create VariantAutoscaling resources, HPAs, and model services themselves. Follow


actually the deleted text below is better. For development the recommended path is to use the install.sh script (and makefile targets)

lionelvillard · 2026-03-26T13:47:41Z

   WVA watches a single InferencePool API group (`inference.networking.k8s.io` or `inference.networking.x-k8s.io`). If the cluster's pools use the other group, the datastore stays empty and scale-from-zero never gets a recommendation.

-   **Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using `deploy/install.sh` with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version.
+   **Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version.


as discussed, the helm chart README should not include instructions about infrastructure installation.

lionelvillard · 2026-03-26T13:48:32Z

 ### E2E and infra-only deploys

-For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [deploy/install.sh](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/install.sh) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml).
+For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml).


Same comment here.

lionelvillard · 2026-03-26T13:50:29Z

-  -n workload-variant-autoscaler-system \
-  --set wva.configMap.immutable=true
-```
+Follow


I would move this section to the helm chart README.

lionelvillard · 2026-03-26T13:51:54Z

- ❌ **NO** VariantAutoscaling resources (tests create these)
- ❌ **NO** HPA resources (tests create these)
- ❌ **NO** Model services (tests create these)
+Follow


Please keep the text above.

doc: Spit to 3 install steps

b0d70e1

lionelvillard reviewed Mar 11, 2026

View reviewed changes

shuynh2017 marked this pull request as draft March 11, 2026 15:19

consolidate all install.sh, helm install/upgrade references

20632d1

shuynh2017 changed the title ~~[DO NOT MERGE] doc: Spit to 3 install steps~~ [DO NOT MERGE] Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions. Mar 13, 2026

shuynh2017 marked this pull request as ready for review March 13, 2026 21:37

asm582 reviewed Mar 16, 2026

View reviewed changes

Merge branch 'llm-d:main' into shuynh_split_install_steps

5a73d58

shuynh2017 and others added 2 commits March 18, 2026 08:48

Merge branch 'main' into shuynh_split_install_steps

6269dc0

Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>

fix merge

363a2ee

lionelvillard reviewed Mar 23, 2026

View reviewed changes

Merge branch 'llm-d:main' into shuynh_split_install_steps

8504b4a

shuynh2017 mentioned this pull request Mar 24, 2026

docs: refresh user-facing documentation (Helm-first, analyzer modes, deploy script placement) #932

Open

8 tasks

mamy-CS reviewed Mar 24, 2026

View reviewed changes

Apply suggestion from @mamy-CS

c31b390

Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com> Signed-off-by: Lionel Villard <villard@us.ibm.com>

lionelvillard reviewed Mar 26, 2026

View reviewed changes

Conversation

shuynh2017 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shuynh2017 commented Mar 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuynh2017 Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuynh2017 commented Mar 13, 2026

Uh oh!

asm582 commented Mar 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asm582 left a comment

Choose a reason for hiding this comment

Uh oh!

shuynh2017 commented Mar 17, 2026

Uh oh!

lionelvillard commented Mar 23, 2026

Uh oh!

lionelvillard commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 23, 2026

GPU Pre-flight Check ✅

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mamy-CS commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

mamy-CS commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

GPU Pre-flight Check ✅

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mamy-CS commented Mar 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuynh2017 commented Mar 11, 2026 •

edited

Loading

shuynh2017 Mar 11, 2026 •

edited

Loading