Convert multi-model deploy script from bash to Go#1015
Conversation
Addresses review feedback from #1014 to move away from bash deployment scripts for readability, type safety, and concurrent model deployment. Key improvements: - Models 2..N deploy concurrently via goroutines (bash was sequential) - Connectivity verification uses kubectl port-forward from the Go process, eliminating the in-cluster curl Job and its Docker Hub image (curlimages/curl:latest) - Kubernetes resources (Gateway, HTTPRoute) created via dynamic client instead of heredoc YAML - Proper error handling and structured logging The Go tool is invoked via `go run ./deploy/multimodel` from the same Makefile targets (deploy-multi-model-infra, undeploy-multi-model-infra). Made-with: Cursor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
- Add INSTALL_GATEWAY_CTRLPLANE to Makefile passthrough (default: false for shared clusters with existing Istio) - Set E2E_TESTS_ENABLED=true to prevent interactive prompts in install.sh - Guard modelArtifacts.labels yq call in infra_llmd.sh to avoid schema validation errors on chart versions that don't support custom labels - Remove unused import in main.go Made-with: Cursor
GPU Pre-flight Check ✅GPUs are available for e2e-openshift tests. Proceeding with deployment.
|
|
Thanks, please test this PR locally with command similar to: Please share a snippet of the test being passed. I know this is manual for now. |
|
Yes, tested locally on OpenShift cluster (wva-bench-test namespace). Full deploy + undeploy + benchmark run completed successfully: Both models deployed and reachable through Gateway |
|
@kahilam linter tests are failing, please fix. |
- Replace fmt.Sprintf with string concatenation (perfsprint) - Preallocate rules slice (prealloc) - Remove unused ctx param from detectInferencePoolAPIGroup (unparam) - Change verifyInferencePools to not return always-nil error (unparam) - Remove unused portStr method and strconv import (unused) Made-with: Cursor
- Use strconv.Itoa instead of fmt.Sprintf for int conversion (perfsprint) - Use string concatenation for svc/ prefix (perfsprint) - Remove trailing blank line in portforward.go (gofmt) Made-with: Cursor
@kahilam, with the new changes, could you please share the output of the command above? |
|
Re-ran the full test with the latest lint fix commits ( Results:
|
|
@asm582 FYI. Full benchmark output (click to expand) |
|
/lgtm |
|
/ok-to-test |
|
🚀 Kind E2E (full) triggered by |
|
🚀 OpenShift E2E — approve and run ( |
|
The VA=1 issue--> The WVA controller is stuck in a transition loop: HPA scales to 2, VA still desires 1, sees desired(1)!=current(2) as "in transition", blocks its own scale-up decision, and keeps trying to scale down to 1. Also seeing pod/pod_name label mismatch for dispatch rate metrics on all pods. This is an HPA/VA conflict in the test setup, not related to the Go conversion. |
|
Thanks for this PR, I do see scale up and scale down: |
|
/lgtm |
| model_label_value=$(echo "$MODEL_ID" | sed 's|.*/||; s|\.|-|g') # e.g. "Qwen3-0-6B", "Meta-Llama-3.1-8B" | ||
| log_info "Setting llm-d.ai/model label to: $model_label_value (unique per model)" | ||
| yq eval ".modelArtifacts.labels.\"llm-d.ai/model\" = \"$model_label_value\"" -i "$LLM_D_MODELSERVICE_VALUES" | ||
| if yq eval '.modelArtifacts | has("labels")' "$LLM_D_MODELSERVICE_VALUES" 2>/dev/null | grep -q "true"; then |
There was a problem hiding this comment.
@kahilam, do we know why we need these changes?
There was a problem hiding this comment.
The modelservice Helm chart has different versions. Newer versions allow me to set custom labels, older versions don't. If I try to set a label on an older chart that doesn't support it, Helm crashes and the deployment fails. This guard checks "does this chart support labels?" before trying to set one, so it works with both old and new chart versions.
Summary
AI-assisted using Cursor IDE.
Converts
deploy/install-multi-model.sh(393-line bash script) into a Go tool atdeploy/multimodel/, addressing review feedback from #1014 (comment) to move away from bash deployment scripts for readability, performance, and enabling concurrent test execution within a single GH workflow run.Key improvements over the bash version:
kubectl port-forwardfrom the Go process, eliminating the in-clustercurlimages/curl:latestJobset -eFiles changed:
deploy/multimodel/main.go,deployer.go,portforward.godeploy/install-multi-model.shMakefile— updated targets to usego run ./deploy/multimodel, addedINSTALL_GATEWAY_CTRLPLANEpassthroughdeploy/lib/infra_llmd.sh— guardedmodelArtifacts.labelsyq call for chart compatibilityFunctional parity:
The Go tool accepts the same environment variables (
MODELS,LLMD_NS,ENVIRONMENT,DECODE_REPLICAS, etc.) and CLI flags (--undeploy) as the bash script. It still delegates todeploy/install.shfor per-model Helm deployments.Tested on OpenShift cluster:
Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B)SUCCESS! -- 1 Passed | 0 Failed)