Skip to content

Convert multi-model deploy script from bash to Go#1015

Open
kahilam wants to merge 4 commits intomainfrom
refactor/multi-model-deploy-to-go
Open

Convert multi-model deploy script from bash to Go#1015
kahilam wants to merge 4 commits intomainfrom
refactor/multi-model-deploy-to-go

Conversation

@kahilam
Copy link
Copy Markdown
Collaborator

@kahilam kahilam commented Apr 15, 2026

Summary

AI-assisted using Cursor IDE.

Converts deploy/install-multi-model.sh (393-line bash script) into a Go tool at deploy/multimodel/, addressing review feedback from #1014 (comment) to move away from bash deployment scripts for readability, performance, and enabling concurrent test execution within a single GH workflow run.

Key improvements over the bash version:

  • Concurrent model deployment: Models 2..N deploy in parallel via goroutines (bash was sequential)
  • No Docker Hub images: Connectivity verification uses kubectl port-forward from the Go process, eliminating the in-cluster curlimages/curl:latest Job
  • Type-safe k8s resources: Gateway and HTTPRoute created via the dynamic client instead of heredoc YAML
  • Better error handling: Go error propagation vs bash set -e

Files changed:

  • Added: deploy/multimodel/main.go, deployer.go, portforward.go
  • Deleted: deploy/install-multi-model.sh
  • Modified: Makefile — updated targets to use go run ./deploy/multimodel, added INSTALL_GATEWAY_CTRLPLANE passthrough
  • Modified: deploy/lib/infra_llmd.sh — guarded modelArtifacts.labels yq call for chart compatibility

Functional parity:

The Go tool accepts the same environment variables (MODELS, LLMD_NS, ENVIRONMENT, DECODE_REPLICAS, etc.) and CLI flags (--undeploy) as the bash script. It still delegates to deploy/install.sh for per-model Helm deployments.

Tested on OpenShift cluster:

  • Full deploy + undeploy cycle with 2 models (Qwen/Qwen3-0.6B, unsloth/Meta-Llama-3.1-8B)
  • Gateway connectivity verified for both models via port-forward
  • Multi-model scaling benchmark ran successfully (SUCCESS! -- 1 Passed | 0 Failed)

Addresses review feedback from #1014 to move away from bash deployment
scripts for readability, type safety, and concurrent model deployment.

Key improvements:
- Models 2..N deploy concurrently via goroutines (bash was sequential)
- Connectivity verification uses kubectl port-forward from the Go
  process, eliminating the in-cluster curl Job and its Docker Hub image
  (curlimages/curl:latest)
- Kubernetes resources (Gateway, HTTPRoute) created via dynamic client
  instead of heredoc YAML
- Proper error handling and structured logging

The Go tool is invoked via `go run ./deploy/multimodel` from the same
Makefile targets (deploy-multi-model-infra, undeploy-multi-model-infra).

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

- Add INSTALL_GATEWAY_CTRLPLANE to Makefile passthrough (default: false
  for shared clusters with existing Istio)
- Set E2E_TESTS_ENABLED=true to prevent interactive prompts in install.sh
- Guard modelArtifacts.labels yq call in infra_llmd.sh to avoid schema
  validation errors on chart versions that don't support custom labels
- Remove unused import in main.go

Made-with: Cursor
@kahilam kahilam requested a review from asm582 April 15, 2026 18:59
@github-actions
Copy link
Copy Markdown
Contributor

GPU Pre-flight Check ✅

GPUs are available for e2e-openshift tests. Proceeding with deployment.

Resource Total Allocated Available
GPUs 50 39 11
Cluster Value
Nodes 16 (7 with GPUs)
Total CPU 993 cores
Total Memory 10383 Gi
GPUs required 4 (min) / 6 (recommended)

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 15, 2026

Thanks, please test this PR locally with command similar to:

abhishekmalvankar@wecm-9-67-159-78 llm-d-workload-variant-autoscaler % make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=asmalvan-test-3 LLMD_NS=asmalvan-test-3 \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B" && \
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=asmalvan-test-3 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

Please share a snippet of the test being passed. I know this is manual for now.

@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 15, 2026

Yes, tested locally on OpenShift cluster (wva-bench-test namespace). Full deploy + undeploy + benchmark run completed successfully:

Both models deployed and reachable through Gateway

make test-multi-model-scaling passed (exit code 0)
make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
  

@kahilam kahilam requested a review from lionelvillard April 15, 2026 19:18
@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 15, 2026

@kahilam linter tests are failing, please fix.

kahilam added 2 commits April 15, 2026 14:29
- Replace fmt.Sprintf with string concatenation (perfsprint)
- Preallocate rules slice (prealloc)
- Remove unused ctx param from detectInferencePoolAPIGroup (unparam)
- Change verifyInferencePools to not return always-nil error (unparam)
- Remove unused portStr method and strconv import (unused)

Made-with: Cursor
- Use strconv.Itoa instead of fmt.Sprintf for int conversion (perfsprint)
- Use string concatenation for svc/ prefix (perfsprint)
- Remove trailing blank line in portforward.go (gofmt)

Made-with: Cursor
@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 15, 2026

Yes, tested locally on OpenShift cluster (wva-bench-test namespace). Full deploy + undeploy + benchmark run completed successfully:

Both models deployed and reachable through Gateway

make test-multi-model-scaling passed (exit code 0)
make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"
  

@kahilam, with the new changes, could you please share the output of the command above?

@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 15, 2026

Re-ran the full test with the latest lint fix commits (cac620d, fa005ec):

make undeploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make deploy-multi-model-infra \
  ENVIRONMENT=openshift \
  WVA_NS=wva-bench-test LLMD_NS=wva-bench-test \
  NAMESPACE_SCOPED=true SKIP_BUILD=true \
  DECODE_REPLICAS=1 IMG_TAG=v0.6.0 LLM_D_RELEASE=v0.6.0 \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

make test-multi-model-scaling \
  ENVIRONMENT=openshift \
  LLMD_NS=wva-bench-test \
  MODELS="Qwen/Qwen3-0.6B,unsloth/Meta-Llama-3.1-8B"

Results:

  • Both models deployed and reachable through Gateway
  • HPA scaled both models from 1→2 replicas under load
  • Scaling monitor ran for 600s+ with stable 2/2 ready replicas
Ran 1 of 6 Specs in 832.610 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (832.61s)
PASS
ok  github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark  834.201s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================

@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 15, 2026

@asm582 FYI.

Full benchmark output (click to expand)
══════════════════════════════════════════════════════════════════
  MULTI-MODEL SCALING BENCHMARK RESULTS
  Models: 2
══════════════════════════════════════════════════════════════════

  ┌────────────────────────────────────────────────────────────
  │ MODEL: Qwen/Qwen3-0.6B
  │ Slug:  qwen-qwen3-0-6b
  ├────────────────────────────────────────────────────────────
  │ Load Job:        SuccessCriteriaMet
  │ Duration:        640s
  │ Final Replicas:  spec=2 ready=2
  │ Max Replicas:    2
  │ Avg Replicas:    1.98
  ├── Prometheus Metrics ──────────────────────────────────────
  │ Avg KV Cache:    0.3467
  │ Avg Queue Depth: 9.90
  │ Avg EPP Queue:   256.83
  ├── GuideLLM Results ────────────────────────────────────────
  │ Achieved RPS:    7.76
  │ Errors:          5125
  │ Incomplete:      505
  ├── Replica Timeline (42 snapshots) ─────────────────────────
  │   t=15s  spec=1  ready=1
  │   t=30s  spec=2  ready=1
  │   t=120s spec=2  ready=2
  │   ... (stable at spec=2 ready=2 through t=630s)
  └────────────────────────────────────────────────────────────

  ┌────────────────────────────────────────────────────────────
  │ MODEL: unsloth/Meta-Llama-3.1-8B
  │ Slug:  unsloth-meta-llama-3-1-8b
  ├────────────────────────────────────────────────────────────
  │ Load Job:        SuccessCriteriaMet
  │ Duration:        640s
  │ Final Replicas:  spec=2 ready=2
  │ Max Replicas:    2
  │ Avg Replicas:    1.98
  ├── Prometheus Metrics ──────────────────────────────────────
  │ Avg KV Cache:    0.5607
  │ Avg Queue Depth: 34.21
  │ Avg EPP Queue:   183.24
  ├── GuideLLM Results ────────────────────────────────────────
  │ Achieved RPS:    6.24
  │ Errors:          6236
  │ Incomplete:      511
  ├── Replica Timeline (42 snapshots) ─────────────────────────
  │   t=15s  spec=1  ready=1
  │   t=30s  spec=2  ready=1
  │   t=150s spec=2  ready=2
  │   ... (stable at spec=2 ready=2 through t=630s)
  └────────────────────────────────────────────────────────────

Ran 1 of 6 Specs in 832.610 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (832.61s)
PASS
ok  github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark  834.201s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 15, 2026

/lgtm
/approve

@github-actions github-actions bot added the lgtm Looks good to me, indicates that a PR is ready to be merged. label Apr 15, 2026
@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 15, 2026

/ok-to-test

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Kind E2E (full) triggered by /ok-to-test

View the Kind E2E workflow run

@github-actions
Copy link
Copy Markdown
Contributor

🚀 OpenShift E2E — approve and run (/ok-to-test)

View the OpenShift E2E workflow run

@kahilam
Copy link
Copy Markdown
Collaborator Author

kahilam commented Apr 15, 2026

The VA=1 issue--> The WVA controller is stuck in a transition loop: HPA scales to 2, VA still desires 1, sees desired(1)!=current(2) as "in transition", blocks its own scale-up decision, and keeps trying to scale down to 1. Also seeing pod/pod_name label mismatch for dispatch rate metrics on all pods. This is an HPA/VA conflict in the test setup, not related to the Go conversion.

@kahilam kahilam enabled auto-merge (squash) April 15, 2026 23:24
@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 16, 2026

Thanks for this PR, I do see scale up and scale down:

  ══════════════════════════════════════════════════════════════════
    MULTI-MODEL SCALING BENCHMARK RESULTS
    Models: 2
  ══════════════════════════════════════════════════════════════════

    ┌────────────────────────────────────────────────────────────
    │ MODEL: Qwen/Qwen3-0.6B
    │ Slug:  qwen-qwen3-0-6b
    ├────────────────────────────────────────────────────────────
    │ Load Job:        SuccessCriteriaMet
    │ Duration:        640s
    │ Final Replicas:  spec=2 ready=2
    │ Max Replicas:    3
    │ Avg Replicas:    2.62
    ├── Prometheus Metrics ──────────────────────────────────────
    │ Avg KV Cache:    0.2969
    │ Avg Queue Depth: 9.55
    │ Avg EPP Queue:   71.10
    ├── GuideLLM Results ────────────────────────────────────────
    │ Achieved RPS:    11.62
    │ TTFT (ms):       {"count":7440,"max":39492.857217788696,"mean":5748.447952699918,"median":68.22490692138672,"min":28.84078025817871,"mode":39.40463066101074,"pdf":null,"percentiles":{"p001":29.954195022583008,"p01":31.963109970092773,"p05":34.35778617858887,"p10":36.066532135009766,"p25":41.10145568847656,"p50":68.22490692138672,"p75":9606.345653533936,"p90":16588.982582092285,"p95":25162.424325942993,"p99":37522.39418029785,"p999":39151.00383758545},"std_dev":8957.843630675445,"total_sum":42768452.76808739,"variance":80242962.51163264}
    │ ITL (ms):        {"count":7432560,"max":38.22515438030193,"mean":13.06922665785656,"median":6.103248806209774,"min":2.7977484721201913,"mode":2.7977484721201913,"pdf":null,"percentiles":{"p001":2.8474922772045725,"p01":3.1527236655906394,"p05":3.4249353933859394,"p10":3.610821696253749,"p25":4.0013697054293065,"p50":6.103248806209774,"p75":25.06122885046301,"p90":29.460214160464787,"p95":30.052853776169968,"p99":32.98202076473751,"p999":37.47222397301171},"std_dev":10.668411937890385,"total_sum":97137811.28811836,"variance":113.81501327652208}
    │ Throughput:      {"count":7440000,"max":121266.6357421875,"mean":12423.798953980498,"median":10246.671009771986,"min":0.22895657302290076,"mode":0.22895657302290076,"pdf":null,"percentiles":{"p001":0.22895657302290076,"p01":0.2289822721387534,"p05":1915.207305936073,"p10":3363.515637530072,"p25":5722.106412005457,"p50":10246.671009771986,"p75":16743.72854291417,"p90":24775.589351894218,"p95":30335.207403608365,"p99":43206.74057825855,"p999":60300.76601002464},"std_dev":9228.151729703304,"total_sum":92433064217.61491,"variance":85158784.3464261}
    │ Errors:          4094
    │ Incomplete:      121
    ├── Replica Timeline (42 snapshots) ─────────────────────────
    │   t=15s  spec=1  ready=1
    │   t=30s  spec=2  ready=1
    │   t=45s  spec=2  ready=1
    │   t=60s  spec=2  ready=1
    │   t=75s  spec=2  ready=1
    │   t=90s  spec=2  ready=1
    │   t=105s  spec=2  ready=1
    │   t=120s  spec=2  ready=2
    │   t=135s  spec=2  ready=2
    │   t=150s  spec=2  ready=2
    │   t=165s  spec=2  ready=2
    │   t=180s  spec=3  ready=2
    │   t=195s  spec=3  ready=2
    │   t=210s  spec=3  ready=2
    │   t=225s  spec=3  ready=2
    │   t=240s  spec=3  ready=2
    │   t=255s  spec=3  ready=2
    │   t=270s  spec=3  ready=2
    │   t=285s  spec=3  ready=2
    │   t=300s  spec=3  ready=3
    │   t=315s  spec=3  ready=3
    │   t=330s  spec=3  ready=3
    │   t=345s  spec=3  ready=3
    │   t=360s  spec=3  ready=3
    │   t=375s  spec=3  ready=3
    │   t=390s  spec=3  ready=3
    │   t=405s  spec=3  ready=3
    │   t=420s  spec=3  ready=3
    │   t=435s  spec=3  ready=3
    │   t=450s  spec=3  ready=3
    │   t=465s  spec=3  ready=3
    │   t=480s  spec=3  ready=3
    │   t=495s  spec=3  ready=3
    │   t=510s  spec=3  ready=3
    │   t=525s  spec=3  ready=3
    │   t=540s  spec=3  ready=3
    │   t=555s  spec=3  ready=3
    │   t=570s  spec=3  ready=3
    │   t=585s  spec=2  ready=2
    │   t=600s  spec=2  ready=2
    │   t=615s  spec=2  ready=2
    │   t=630s  spec=2  ready=2
    └────────────────────────────────────────────────────────────

    ┌────────────────────────────────────────────────────────────
    │ MODEL: unsloth/Meta-Llama-3.1-8B
    │ Slug:  unsloth-meta-llama-3-1-8b
    ├────────────────────────────────────────────────────────────
    │ Load Job:        SuccessCriteriaMet
    │ Duration:        640s
    │ Final Replicas:  spec=2 ready=2
    │ Max Replicas:    3
    │ Avg Replicas:    2.48
    ├── Prometheus Metrics ──────────────────────────────────────
    │ Avg KV Cache:    0.3402
    │ Avg Queue Depth: 11.98
    │ Avg EPP Queue:   76.33
    ├── GuideLLM Results ────────────────────────────────────────
    │ Achieved RPS:    5.64
    │ TTFT (ms):       {"count":3612,"max":35652.39071846008,"mean":8797.05326283889,"median":400.432825088501,"min":74.51033592224121,"mode":128.53074073791504,"pdf":null,"percentiles":{"p001":76.00045204162598,"p01":79.52380180358887,"p05":83.17351341247559,"p10":86.70234680175781,"p25":142.32182502746582,"p50":400.432825088501,"p75":16011.017084121704,"p90":27655.20977973938,"p95":32447.837591171265,"p99":34792.26303100586,"p999":35578.76014709473},"std_dev":11445.213173063228,"total_sum":31774956.38537407,"variance":130992904.57686004}
    │ ITL (ms):        {"count":3608388,"max":47.51579253165214,"mean":19.854888642225824,"median":19.79256797958542,"min":5.188541011409359,"mode":5.188541011409359,"pdf":null,"percentiles":{"p001":5.541122234142101,"p01":6.002523519613363,"p05":6.436811672435986,"p10":6.652015107530016,"p25":7.275045335710466,"p50":19.79256797958542,"p75":31.282470987604427,"p90":38.150784250971554,"p95":40.05243398763754,"p99":42.999861118671774,"p999":46.20621464512608},"std_dev":12.756438617742925,"total_sum":71644141.91794395,"variance":162.72672620824304}
    │ Throughput:      {"count":3612000,"max":89328.576959574,"mean":6022.24663721846,"median":4500.326180257511,"min":0.24693105606477062,"mode":0.24789795527357833,"pdf":null,"percentiles":{"p001":0.24693105606477062,"p01":0.24789795527357833,"p05":726.7624561915939,"p10":1473.0872238165603,"p25":2622.527797753948,"p50":4500.326180257511,"p75":7728.291623740843,"p90":12693.84180716234,"p95":16336.618384454037,"p99":25556.848027897267,"p999":38486.42719422505},"std_dev":5234.371706959363,"total_sum":21752354853.633076,"variance":27398647.166616675}
    │ Errors:          7956
    │ Incomplete:      119
    ├── Replica Timeline (42 snapshots) ─────────────────────────
    │   t=15s  spec=1  ready=1
    │   t=30s  spec=1  ready=1
    │   t=45s  spec=1  ready=1
    │   t=60s  spec=2  ready=1
    │   t=75s  spec=2  ready=1
    │   t=90s  spec=2  ready=1
    │   t=105s  spec=2  ready=1
    │   t=120s  spec=2  ready=1
    │   t=135s  spec=2  ready=1
    │   t=150s  spec=2  ready=1
    │   t=165s  spec=2  ready=1
    │   t=180s  spec=2  ready=2
    │   t=195s  spec=2  ready=2
    │   t=210s  spec=2  ready=2
    │   t=225s  spec=2  ready=2
    │   t=240s  spec=2  ready=2
    │   t=255s  spec=2  ready=2
    │   t=270s  spec=2  ready=2
    │   t=285s  spec=2  ready=2
    │   t=300s  spec=3  ready=2
    │   t=315s  spec=3  ready=2
    │   t=330s  spec=3  ready=2
    │   t=345s  spec=3  ready=2
    │   t=360s  spec=3  ready=2
    │   t=375s  spec=3  ready=2
    │   t=390s  spec=3  ready=2
    │   t=405s  spec=3  ready=2
    │   t=420s  spec=3  ready=3
    │   t=435s  spec=3  ready=3
    │   t=450s  spec=3  ready=3
    │   t=465s  spec=3  ready=3
    │   t=480s  spec=3  ready=3
    │   t=495s  spec=3  ready=3
    │   t=510s  spec=3  ready=3
    │   t=525s  spec=3  ready=3
    │   t=540s  spec=3  ready=3
    │   t=555s  spec=3  ready=3
    │   t=570s  spec=3  ready=3
    │   t=585s  spec=3  ready=3
    │   t=600s  spec=3  ready=3
    │   t=615s  spec=3  ready=3
    │   t=630s  spec=3  ready=3
    └────────────────────────────────────────────────────────────

  STEP: Saving multi-model benchmark results to file @ 04/15/26 22:23:37.201
  Results saved to /tmp/multi-model-benchmark-results.json
  Multi-model benchmark complete — cleaning up
    Scaled ms-qwen-qwen3-0-6b-llm-d-modelservice-decode back to 1
    Scaled ms-unsloth-meta-llama-3-1-8b-llm-d-modelservice-decode back to 1
• [768.479 seconds]
------------------------------
[AfterSuite] 
/Users/abhishekmalvankar/go-conv/llm-d-workload-variant-autoscaler/test/benchmark/suite_test.go:131
  STEP: Killing Prometheus port-forward @ 04/15/26 22:23:43.289
[AfterSuite] PASSED [0.000 seconds]
------------------------------

Ran 1 of 6 Specs in 769.826 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 5 Skipped
--- PASS: TestBenchmark (769.83s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/benchmark       771.049s

==========================================
Multi-model benchmark completed. Exit code: 0
==========================================
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % date
Wed Apr 15 22:24:07 EDT 2026
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % oc project
Using project "asmalvan-test-6" on server "https://api.pokprod001.ete14.res.ibm.com:6443".
abhishekmalvankar@wecm-9-67-148-249 llm-d-workload-variant-autoscaler % oc get pods
NAME                                                              READY   STATUS    RESTARTS   AGE
gaie-qwen-qwen3-0-6b-epp-746d56c64f-r789x                         1/1     Running   0          14m
gaie-unsloth-meta-llama-3-1-8b-epp-6cfd55c595-s9xxd               1/1     Running   0          14m
ms-qwen-qwen3-0-6b-llm-d-modelservice-decode-75c84475f8-mx5xr     1/1     Running   0          20m
ms-unsloth-meta-llama-3-1-8b-llm-d-modelservice-decode-779phn6v   1/1     Running   0          17m
multi-model-inference-gateway-istio-f5f74b8f9-5wkbz               1/1     Running   0          17m
workload-variant-autoscaler-controller-manager-79b48b6cc5-9b9hh   1/1     Running   0          18m
workload-variant-autoscaler-controller-manager-79b48b6cc5-gjdqt   1/1     Running   0          19m

@asm582
Copy link
Copy Markdown
Collaborator

asm582 commented Apr 16, 2026

/lgtm
/approve

@asm582 asm582 disabled auto-merge April 16, 2026 02:27
Comment thread deploy/lib/infra_llmd.sh
model_label_value=$(echo "$MODEL_ID" | sed 's|.*/||; s|\.|-|g') # e.g. "Qwen3-0-6B", "Meta-Llama-3.1-8B"
log_info "Setting llm-d.ai/model label to: $model_label_value (unique per model)"
yq eval ".modelArtifacts.labels.\"llm-d.ai/model\" = \"$model_label_value\"" -i "$LLM_D_MODELSERVICE_VALUES"
if yq eval '.modelArtifacts | has("labels")' "$LLM_D_MODELSERVICE_VALUES" 2>/dev/null | grep -q "true"; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kahilam, do we know why we need these changes?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modelservice Helm chart has different versions. Newer versions allow me to set custom labels, older versions don't. If I try to set a label on an older chart that doesn't support it, Helm crashes and the deployment fails. This guard checks "does this chart support labels?" before trying to set one, so it works with both old and new chart versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm Looks good to me, indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants