Installation doc: introduce infrastructure, WVA controller, WVA variant installations, and consolidate install instructions.#876
Conversation
| -f config/samples/prometheus-adapter-values-ocp.yaml | ||
| export HF_TOKEN="hf_xxxxx" | ||
| cd $WVA_PROJECT | ||
| ./deploy/install.sh # SUMH: can we move it to a new 'scripts' folder and rename to install-infra.sh? |
There was a problem hiding this comment.
I wonder if llm-d provide such script already. We aren't the only project to install llm-d, aren't we?
There was a problem hiding this comment.
Yep, we have a function deploy_llm_d_infrastructure that downloads llmd, runs couple of scripts from llmd, and does a number of other things.
|
@lionelvillard, this is ready for review. Sorry for the many changes but I think we need to make these changes in one step. Thanks. |
|
Do we know what is pending in this PR? Can we squash commits? |
|
|
||
| ```bash | ||
| export HF_TOKEN="hf_xxxxx" | ||
| export ENVIRONMENT="kind-emulator" # or "openshift", "kubernetes" |
There was a problem hiding this comment.
nit: I vote to drop Kubernetes.
There was a problem hiding this comment.
ok, once we decide, I can remove its support from the installation to reduce the code (in the following PR)
There was a problem hiding this comment.
Agree with @asm582. This script is only used for e2e tests and we currently only support kind and openshift
There was a problem hiding this comment.
ok, I will remove from README.md now. We can remove related code from install.sh later.
|
@lionelvillard pls review. We can review off-line as well. Thanks. |
Signed-off-by: Sum Huynh <31661254+shuynh2017@users.noreply.github.com>
|
/ok-to-test |
|
/trigger-e2e-full |
|
🚀 OpenShift E2E — approve and run ( |
|
🚀 Kind E2E (full) triggered by |
|
|
||
| Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads | ||
| ## Installation Overview | ||
| Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed as scaling targets for model deployments in llm-d namespaces. |
There was a problem hiding this comment.
infrastructure installation is out-of-scope. Instead add a prerequisite section including llm-d infrastructure as a required prerequisite.
There was a problem hiding this comment.
@lionelvillard, in the pre-req section, do we want to mention using install.sh or just mention the components need to be there? If we don't mention install.sh then we are on a good path to reduce it to just for developer (kind install).
There was a problem hiding this comment.
install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
| settings, manually append the values in `config/samples/prometheus-adapter-values-ocp.yaml` | ||
| then run helm upgrade with the appended values. Here's an example how to get the current | ||
| values: `kubectl get configmap prometheus-adapter -n $MON_NS -o yaml` | ||
| ### Infrastructure Installation |
There was a problem hiding this comment.
as discussed above this is out-of-scope.
| control-plane: controller-manager | ||
| openshift.io/user-monitoring: "true" | ||
| EOF | ||
| ### WVA Controller Installation |
There was a problem hiding this comment.
I would like to document both installation modes, cluster-wide and namespace-wide installation. At least in this PR, can you mention what installation mode the instructions below are covering?
There was a problem hiding this comment.
I will add a short sentence in this section to high-light this option.
| ### Step 4: Add Models as Scale Targets To WVA Controller | ||
| For more configurable parameters for WVA controller see $WVA_PROJECT/charts/workload-variant-autoscaler/values.yaml | ||
|
|
||
| ### WVA Variant Installation |
There was a problem hiding this comment.
Eventually this section will go away. Managing VA is only supported using yaml manifests (with kustomize or not).
There was a problem hiding this comment.
ok. I will leave it with helm for now.
| # Run deployment script | ||
| bash install.sh | ||
| ``` | ||
| #### Installation |
There was a problem hiding this comment.
I would remove this section as 1. the helm chart doc points back to this document 2. the helm chart doc should not document infra installation.
There was a problem hiding this comment.
ok, will remove this section. Stuff under deploy directory can use another pass to clean up.
|
/trigger-e2e-full |
|
🚀 Kind E2E (full) triggered by |
|
/ok-to-test |
|
🚀 OpenShift E2E — approve and run ( |
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
|
||
| Helm chart for Workload-Variant-Autoscaler (WVA) - GPU-aware autoscaler for LLM inference workloads | ||
| ## Installation Overview | ||
| Installation is divided into three separate parts. First is the infrastructure installation where WVA prerequisites such as llm-d, gateway control plane, Prometheus, Prometheus adapter, or KEDA are installed, and configured to work with WVA on environments such as Openshift, Kubenettes. Next, a WVA controller can be installed in a namespace. Finally, for each WVA controller, one or more WVA variants can be installed as scaling targets for model deployments in llm-d namespaces. |
There was a problem hiding this comment.
install.sh should be bubbled only in the developer guides and not exposed to the user guides is the general thinking.
| kubectl get secret thanos-querier-tls -n openshift-monitoring -o jsonpath='{.data.tls\.crt}' | base64 -d > /tmp/prometheus-ca.crt | ||
| ### Download and Setup Variables | ||
| ```bash | ||
| export WVA_RELEASE="v0.5.1" # select a release from https://github.com/llm-d/llm-d-workload-variant-autoscaler |
There was a problem hiding this comment.
maybe keep the release generic, instead of pinning one to avoid users pinning that release.
|
@shuynh2017 have one pass at this, and we can merge it to keep things rolling. Thanks. |
Co-authored-by: Mohammed Munir Abdi <abdimamy@gmail.com> Signed-off-by: Lionel Villard <villard@us.ibm.com>
|
|
||
| git clone -b $WVA_RELEASE -- https://github.com/$OWNER/$WVA_PROJECT.git $WVA_PROJECT | ||
| cd $WVA_PROJECT | ||
| git clone -b $WVA_RELEASE -- https://github.com/llm-d/llm-d-workload-variant-autoscaler.git llm-d-workload-variant-autoscaler |
There was a problem hiding this comment.
there is no need to clone the repository. Instead, these instructions should point to the chart registry.
|
|
||
| cd $WVA_PROJECT/charts | ||
| helm upgrade -i workload-variant-autoscaler ./workload-variant-autoscaler \ | ||
| helm install workload-variant-autoscaler ./workload-variant-autoscaler \ |
There was a problem hiding this comment.
this documentation should not assume the WVA repository has been cloned. All helm commands should use published helm charts.
| ### WVA Variant Installation | ||
| After a WVA controller has been installed, | ||
| you can add one or more models running in LLMD namespaces as scale targets to the WVA controller. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` LLMD namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace. | ||
| you can add one or more models running in llm-d namespaces as scale targets to the WVA controller by creating WVA variants. As an example, the following command adds model name `my-model-a` with model ID `meta-llama/Llama-3.1-8` running in `team-a` llm-d namespace. This command creates the corresponding VA, HPA resources in `team-a` namespace. Again, please note the required values for some of these parameters. |
| ### Infra-Only Setup (Required Before Running Tests) | ||
|
|
||
| Tests expect **only** the WVA controller and llm-d infrastructure to be deployed; they create VariantAutoscaling resources, HPAs, and model services themselves. Use the install script in **infra-only** mode: | ||
| Tests expect **only** the WVA controller and llm-d infrastructure to be deployed. The tests create VariantAutoscaling resources, HPAs, and model services themselves. Follow |
There was a problem hiding this comment.
actually the deleted text below is better. For development the recommended path is to use the install.sh script (and makefile targets)
| WVA watches a single InferencePool API group (`inference.networking.k8s.io` or `inference.networking.x-k8s.io`). If the cluster's pools use the other group, the datastore stays empty and scale-from-zero never gets a recommendation. | ||
|
|
||
| **Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using `deploy/install.sh` with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version. | ||
| **Solution**: Ensure InferencePool is created and reconciled before creating VariantAutoscaling. When using [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) with llm-d (e.g. kind-emulator or CI), the script auto-detects the pool API group after llm-d deploy and upgrades WVA with the correct `wva.poolGroup` so both local and CI work regardless of llm-d version. |
There was a problem hiding this comment.
as discussed, the helm chart README should not include instructions about infrastructure installation.
| ### E2E and infra-only deploys | ||
|
|
||
| For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [deploy/install.sh](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/install.sh) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml). | ||
| For e2e and infra-only deploys, the install script enables EPP flow control and optionally applies an InferenceObjective when `E2E_TESTS_ENABLED=true` or `ENABLE_SCALE_TO_ZERO=true`. See [Infrastructure Installation](../../charts/workload-variant-autoscaler/README.md#infrastructure-installation) and [deploy/inference-objective-e2e.yaml](https://github.com/llm-d/llm-d-workload-variant-autoscaler/blob/main/deploy/inference-objective-e2e.yaml). |
| -n workload-variant-autoscaler-system \ | ||
| --set wva.configMap.immutable=true | ||
| ``` | ||
| Follow |
There was a problem hiding this comment.
I would move this section to the helm chart README.
| - ❌ **NO** VariantAutoscaling resources (tests create these) | ||
| - ❌ **NO** HPA resources (tests create these) | ||
| - ❌ **NO** Model services (tests create these) | ||
| Follow |
There was a problem hiding this comment.
Please keep the text above.
Epic: #872
The issues:
Architecturely, the installation procedure needs to be splitted clearly into 3 parts: infrastructure, WVA controller, WVA variant.
This PR addresses the above issues by:
charts/workload-variant-autoscaler/README.md.As the result:
install.shin the .md files, there should only one place where it's mentioned which is the aboveREADME.md.helm install,helm upgradefor controller or variant, there should also be one place where it's documented which is aboveREADME.md.Summary: There should be no changes due to this PR as it only clarifies, consolidates, and prepares for the next step in cleaning the
install.shscript itself. As such, for any changes ininstall.sh(such as reducing it to only infrastructure and no more), we will only have to update the document in one place.