Skip to content

Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions.#876

Open
shuynh2017 wants to merge 7 commits intollm-d:mainfrom
shuynh2017:shuynh_split_install_steps
Open

Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions.#876
shuynh2017 wants to merge 7 commits intollm-d:mainfrom
shuynh2017:shuynh_split_install_steps

Conversation

@shuynh2017
Copy link
Copy Markdown
Collaborator

@shuynh2017 shuynh2017 commented Mar 11, 2026

Epic: #872

The issues:

  • The installation procedure as documented in many places are not consitent and without clarity such as whether the installation is for infrastructure, WVA controller or variant.
  • There are repeating disclaimers about what being installed and not being installed.
  • Installation instructions are in many places, not consistent, and can only get more inconsistent.
  • There are other issues with documentation; howver, we will just keep the scope of this PR to be managable.

Architecturely, the installation procedure needs to be splitted clearly into 3 parts: infrastructure, WVA controller, WVA variant.

This PR addresses the above issues by:

  • Introducing the official installation produre to be 3 parts as mentioned, and documenting in one single place (instead of many places): charts/workload-variant-autoscaler/README.md.
  • Updating all places with installation proceure to point to the above single source README.md, and point to appropriate section (infrastructure, controller, variant).
  • Remove redundant, unnecessary installation options such as minimal installation, developer installation, etc ...

As the result:

  • If one seaches for running install.sh in the .md files, there should only one place where it's mentioned which is the above README.md.
  • If one searches for helm install, helm upgrade for controller or variant, there should also be one place where it's documented which is above README.md.

Summary: There should be no changes due to this PR as it only clarifies, consolidates, and prepares for the next step in cleaning the install.sh script itself. As such, for any changes in install.sh (such as reducing it to only infrastructure and no more), we will only have to update the document in one place.

@shuynh2017
Copy link
Copy Markdown
Collaborator Author

@lionelvillard

-f config/samples/prometheus-adapter-values-ocp.yaml
export HF_TOKEN="hf_xxxxx"
cd $WVA_PROJECT
./deploy/install.sh # SUMH: can we move it to a new 'scripts' folder and rename to install-infra.sh?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if llm-d provide such script already. We aren't the only project to install llm-d, aren't we?

Copy link
Copy Markdown
Collaborator Author

@shuynh2017 shuynh2017 Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we have a function deploy_llm_d_infrastructure that downloads llmd, runs couple of scripts from llmd, and does a number of other things.

@shuynh2017 shuynh2017 marked this pull request as draft March 11, 2026 15:19
@shuynh2017 shuynh2017 changed the title [DO NOT MERGE] doc: Spit to 3 install steps [DO NOT MERGE] Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions. Mar 13, 2026
@shuynh2017 shuynh2017 changed the title [DO NOT MERGE] Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions. Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions. Mar 13, 2026
@shuynh2017 shuynh2017 marked this pull request as ready for review March 13, 2026 21:37
@shuynh2017
Copy link
Copy Markdown
Collaborator Author

@lionelvillard, this is ready for review. Sorry for the many changes but I think we need to make these changes in one step. Thanks.

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Mar 16, 2026

Do we know what is pending in this PR? Can we squash commits?


```bash
export HF_TOKEN="hf_xxxxx"
export ENVIRONMENT="kind-emulator" # or "openshift", "kubernetes"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I vote to drop Kubernetes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, once we decide, I can remove its support from the installation to reduce the code (in the following PR)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @asm582. This script is only used for e2e tests and we currently only support kind and openshift

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will remove from README.md now. We can remove related code from install.sh later.

Copy link
Copy Markdown
Collaborator

@asm582 asm582 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added nit

@shuynh2017
Copy link
Copy Markdown
Collaborator Author

@lionelvillard pls review. We can review off-line as well. Thanks.

shuynh2017 and others added 2 commits March 18, 2026 08:48
Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>
@lionelvillard
Copy link
Copy Markdown
Collaborator

/ok-to-test

@lionelvillard
Copy link
Copy Markdown
Collaborator

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run


Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads
## Installation Overview
Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed as scaling targets for model deployments in llm-d namespaces.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infrastructure installation is out-of-scope. Instead add a prerequisite section including llm-d infrastructure as a required prerequisite.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lionelvillard, in the pre-req section, do we want to mention using install.sh or just mention the components need to be there? If we don't mention install.sh then we are on a good path to reduce it to just for developer (kind install).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 28 22
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

settings, manually append the values in `config/samples/prometheus-adapter-values-ocp.yaml`
then run helm upgrade with the appended values. Here's an example how to get the current
values: `kubectl get configmap prometheus-adapter -n $MON_NS -o yaml`
### Infrastructure Installation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed above this is out-of-scope.

control-plane: controller-manager
openshift.io/user-monitoring: "true"
EOF
### WVA Controller Installation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to document both installation modes, cluster-wide and namespace-wide installation. At least in this PR, can you mention what installation mode the instructions below are covering?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a short sentence in this section to high-light this option.

### Step 4: Add Models as Scale Targets To WVA Controller
For more configurable parameters for WVA controller see $WVA_PROJECT/charts/workload-variant-autoscaler/values.yaml

### WVA Variant Installation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually this section will go away. Managing VA is only supported using yaml manifests (with kustomize or not).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will leave it with helm for now.

Comment thread deploy/README.md
# Run deployment script
bash install.sh
```
#### Installation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this section as 1. the helm chart doc points back to this document 2. the helm chart doc should not document infra installation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will remove this section. Stuff under deploy directory can use another pass to clean up.

@mamy-CS
Copy link
Copy Markdown
Collaborator

mamy-CS commented Mar 24, 2026

/trigger-e2e-full

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /trigger-e2e-full

View the Kind E2E workflow run

@mamy-CS
Copy link
Copy Markdown
Collaborator

mamy-CS commented Mar 24, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 24 26
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

Comment thread charts/workload-variant-autoscaler/README.md Outdated

Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads
## Installation Overview
Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed as scaling targets for model deployments in llm-d namespaces.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.

kubectl get secret thanos-querier-tls -n openshift-monitoring -o jsonpath='{.data.tls\.crt}' | base64 -d > /tmp/prometheus-ca.crt
### Download and Setup Variables
```bash
export WVA_RELEASE="v0.5.1" # select a release from https://github.com/llm-d/llm-d-workload-variant-autoscaler
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep the release generic, instead of pinning one to avoid users pinning that release.

@mamy-CS
Copy link
Copy Markdown
Collaborator

mamy-CS commented Mar 25, 2026

@shuynh2017 have one pass at this, and we can merge it to keep things rolling. Thanks.

Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com>
Signed-off-by: Lionel Villard <villard@us.ibm.com>

git clone -b $WVA_RELEASE -- https://github.com/$OWNER/$WVA_PROJECT.git $WVA_PROJECT
cd $WVA_PROJECT
git clone -b $WVA_RELEASE -- https://github.com/llm-d/llm-d-workload-variant-autoscaler.git llm-d-workload-variant-autoscaler
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to clone the repository. Instead, these instructions should point to the chart registry.


cd $WVA_PROJECT/charts
helm upgrade -i workload-variant-autoscaler ./workload-variant-autoscaler \
helm install workload-variant-autoscaler ./workload-variant-autoscaler \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this documentation should not assume the WVA repository has been cloned. All helm commands should use published helm charts.

### WVA Variant Installation
After a WVA controller has been installed,
you can add one or more models running in LLMD namespaces as scale targets to the WVA controller. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` LLMD namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace.
you can add one or more models running in llm-d namespaces as scale targets to the WVA controller by creating WVA variants. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` llm-d namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace. Again, please note the required values for some of these parameters.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you -> You.

### Infra-Only Setup (Required Before Running Tests)

Tests expect **only** the WVA controller and llm-d infrastructure to be deployed; they create VariantAutoscaling resources, HPAs, and model services themselves. Use the install script in **infra-only** mode:
Tests expect **only** the WVA controller and llm-d infrastructure to be deployed. The tests create VariantAutoscaling resources, HPAs, and model services themselves. Follow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the deleted text below is better. For development the recommended path is to use the install.sh script (and makefile targets)

WVA watches a single InferencePool API group (`inference.networking.k8s.io` or `inference.networking.x-k8s.io`). If the cluster's pools use the other group, the datastore stays empty and scale-from-zero never gets a recommendation.

**Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using `deploy/install.sh` with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version.
**Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, the helm chart README should not include instructions about infrastructure installation.

### E2E and infra-only deploys

For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [deploy/install.sh](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/install.sh) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml).
For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

-n workload-variant-autoscaler-system \
--set wva.configMap.immutable=true
```
Follow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this section to the helm chart README.

Comment thread test/e2e/README.md
- ❌ **NO** VariantAutoscaling resources (tests create these)
- ❌ **NO** HPA resources (tests create these)
- ❌ **NO** Model services (tests create these)
Follow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the text above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants