feat: add support for podman + bump version for llm-d and all images by zdtsw · Pull Request #857 · llm-d/llm-d-workload-variant-autoscaler

zdtsw · 2026-03-07T18:34:00Z

Changes

use env variable LLM_D_RELEASE to control all image in the deploy/install.sh
clone llm-d to local based on local version if match required release version
use env variable CONTAINER_TOOL to support podmano on fedora
remove/update *ignore files
update default infpool API versoin to v1
fix e2e test on scale-from-zero: github action or make target only runs on "kind", model name is wrong need to be randon, guide name is wrong need to match what is set in the gaie-sim, add restart deployment after FC is enabled, add restart after API(infpool CRD is applied in the cluster)

Notes

without the change ,it runs with

NAME        NAMESPACE   CHART                                                                        VERSION   DURATION
infra-sim   llm-d-sim   llm-d-infra/llm-d-infra                                                      v1.3.3          0s
gaie-sim    llm-d-sim   oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool   v1.0.1          1s
ms-sim      llm-d-sim   llm-d-modelservice/llm-d-modelservice                                        v0.2.11         1s

Test

go test -v -timeout 20m -ginkgo.focus="Scale-From-Zero Feature" -ginkgo.v .
=== RUN   TestE2E
Running Suite: E2E Test Suite - /home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e
============================================================================================
Random Seed: 1773071948

Will run 6 of 48 specs
------------------------------
[BeforeSuite] 
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/suite_test.go:49
  STEP: Loading configuration from environment @ 03/09/26 16:59:08.259
  === E2E Test Configuration ===
  Environment: kind-emulator
  WVA Namespace: workload-variant-autoscaler-system
  LLMD Namespace: llm-d-sim
  Use Simulator: true
  Scale-to-Zero Enabled: false
  Scaler Backend: prometheus-adapter
  Model ID: unsloth/Meta-Llama-3.1-8B
  Load Strategy: synthetic
  ==============================

  STEP: Initializing Kubernetes client @ 03/09/26 16:59:08.259
  STEP: Verifying WVA controller is running @ 03/09/26 16:59:08.263
  STEP: Verifying llm-d infrastructure @ 03/09/26 16:59:08.271
  STEP: Verifying Prometheus is available @ 03/09/26 16:59:08.272
  STEP: Restarting prometheus-adapter pods @ 03/09/26 16:59:08.274
  Deleted prometheus-adapter pod: prometheus-adapter-6748c5c5c6-b8q64
  Deleted prometheus-adapter pod: prometheus-adapter-6748c5c5c6-d8js6
  STEP: Waiting for prometheus-adapter pods to be ready @ 03/09/26 16:59:08.284
  prometheus-adapter pods restarted and ready
  BeforeSuite completed successfully - infrastructure ready
[BeforeSuite] PASSED [0.031 seconds]
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
Scale-From-Zero Feature Initial state verification should have VariantAutoscaling resource created [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:219
  STEP: Waiting for InferencePool to be reconciled (allows time for controller to register it in datastore) @ 03/09/26 16:59:08.29
  Looking for EPP service: gaie-sim-epp in namespace: llm-d-sim
  STEP: Creating model service deployment with 0 initial replicas @ 03/09/26 16:59:13.301
  STEP: Scaling deployment to 0 replicas @ 03/09/26 16:59:13.309
  STEP: Creating service to expose model server @ 03/09/26 16:59:13.319
  STEP: Creating ServiceMonitor for metrics scraping @ 03/09/26 16:59:13.326
  STEP: Verifying deployment is at 0 replicas @ 03/09/26 16:59:13.335
  STEP: Creating VariantAutoscaling resource @ 03/09/26 16:59:18.341
  STEP: Creating scaler with minReplicas=0 (HPA or ScaledObject per backend) @ 03/09/26 16:59:18.35
  STEP: Waiting for VA to be ready and InferencePool to be available in datastore @ 03/09/26 16:59:18.354
  Scale-from-zero test setup complete with deployment at 0 replicas
  STEP: Verifying VariantAutoscaling exists @ 03/09/26 17:00:18.383
  VariantAutoscaling resource verified: scale-from-zero-va
• [70.095 seconds]
------------------------------
Scale-From-Zero Feature Initial state verification should verify deployment starts at zero replicas [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:232
  STEP: Checking deployment has 0 replicas @ 03/09/26 17:00:18.385
  Deployment verified at 0 replicas
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Initial state verification should have scaler configured with minReplicas=0 [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:246
  STEP: Verifying HPA allows scale-to-zero @ 03/09/26 17:00:18.386
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should detect pending requests and trigger scale-from-zero [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:284
  STEP: Discovering inference gateway service @ 03/09/26 17:00:18.387
  Found inference gateway service: infra-sim-inference-gateway-istio
  STEP: Creating a job to send requests while deployment is at zero @ 03/09/26 17:00:18.389
  Created scale-from-zero trigger job: scale-from-zero-trigger-1773072018
  STEP: Waiting for job pod to be running and sending requests @ 03/09/26 17:00:18.397
  Job pod is running and sending requests
  STEP: Waiting for requests to queue up in EPP flow control queue @ 03/09/26 17:00:23.405
  STEP: Monitoring VariantAutoscaling for scale-from-zero decision @ 03/09/26 17:00:33.414
  VA DesiredOptimizedAlloc.NumReplicas: 1 (waiting for > 0)
    MetricsAvailable: True/ScaleFromZero (Scaled from zero due to pending requests)
  Scale-from-zero engine detected pending requests and recommended scale-up
• [15.028 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should scale deployment up from zero [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:363
  STEP: Monitoring deployment for actual scale-up from zero @ 03/09/26 17:00:33.415
  Current replicas: 1, ready: 1 (waiting for > 0)
  Deployment successfully scaled up from zero
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should successfully process requests after scaling up [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:388
  STEP: Verifying the trigger job completes successfully @ 03/09/26 17:00:33.417
  Requests processed successfully after scale-from-zero
  STEP: Cleaning up trigger job @ 03/09/26 17:01:33.448
  STEP: Cleaning up scale-from-zero test resources @ 03/09/26 17:01:33.452
  Successfully deleted ServiceMonitor scale-from-zero-ms-monitor
• [60.059 seconds]
------------------------------
SSSSSSS
------------------------------
[AfterSuite] 
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/suite_test.go:215
  STEP: Cleaning up any leftover test resources @ 03/09/26 17:01:33.476
  Cleaning up test resources...
  Keeping Kind cluster for debugging (set DELETE_CLUSTER=true to delete)
[AfterSuite] PASSED [0.017 seconds]
------------------------------

Ran 6 of 48 Specs in 145.235 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 42 Skipped
--- PASS: TestE2E (145.24s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/e2e     145.256s

zdtsw · 2026-03-07T18:35:26Z

cc @shuynh2017

Copilot

Pull request overview

Adds configurability to deployment scripts to better support local development environments (notably Podman) and to centralize llm-d version/image selection so deployments stay consistent across components.

Changes:

Introduces LLM_D_RELEASE-driven defaults for llm-d component images in deploy/install.sh and updates the default release.
Adds CONTAINER_TOOL support (docker vs podman) for Kind-related image handling and container exec usage.
Updates docs/ignore files to reflect the new workflow and local repo cloning behavior.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
deploy/kubernetes/create-kind-cluster-with-nvidia.sh	Parameterizes container exec calls via `CONTAINER_TOOL`.
deploy/kind-emulator/install.sh	Adds `CONTAINER_TOOL` and a podman-specific Kind image loading path.
deploy/install.sh	Bumps default llm-d release, centralizes image defaults, and changes llm-d repo clone behavior.
deploy/README.md	Documents Podman support and centralized `LLM_D_RELEASE` configuration.
Makefile	Passes `CONTAINER_TOOL` through to deploy script in e2e infra target.
.gitignore	Removes outdated ignore entries for old llm-d directory names.
.dockerignore	Excludes `llm-d/` from Docker build context.

Copilot · 2026-03-07T18:39:00Z

+
    if [ ! -d "$LLM_D_PROJECT" ]; then
        log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
        git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null


git clone ... &> /dev/null suppresses all clone output. When the clone fails (e.g., tag doesn’t exist, network/auth issues), the script will exit due to set -e but without any actionable error details. Prefer letting stderr through or capturing it and emitting via log_error for easier troubleshooting.

Suggested change

git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null

git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT > /dev/null

Copilot · 2026-03-07T18:39:01Z


+# Container tool (docker or podman)
+CONTAINER_TOOL=${CONTAINER_TOOL:-docker}
+


Since CONTAINER_TOOL is executed directly (e.g. $CONTAINER_TOOL exec ...), validate it up front (only allow docker/podman) and fail fast with a clear message if the command is not found. This prevents hard-to-debug failures later and avoids running unexpected commands when the env var is mis-set.

Suggested change

# Validate container tool selection early to avoid confusing failures and unexpected commands

case "${CONTAINER_TOOL}" in

docker|podman)

if ! command -v "${CONTAINER_TOOL}" >/dev/null 2>&1; then

echo "Error: CONTAINER_TOOL is set to '${CONTAINER_TOOL}', but the command was not found in PATH." >&2

exit 1

fi

;;

*)

echo "Error: CONTAINER_TOOL must be either 'docker' or 'podman', but is set to '${CONTAINER_TOOL}'." >&2

exit 1

;;

esac

Copilot · 2026-03-07T18:39:01Z

        # Pull a single-platform image so kind load does not hit "content digest not found"
-        # (multi-platform manifests can reference blobs that are not in the docker save stream).
+        # (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
        local platform="${KIND_IMAGE_PLATFORM:-}"
        if [ -z "$platform" ]; then
            case "$(uname -m)" in


This block introduces $CONTAINER_TOOL in the comment/logic, but the subsequent image pull/inspect in this function still uses hard-coded docker commands. With CONTAINER_TOOL=podman, the pull path will fail. Please switch the pull/inspect commands to use $CONTAINER_TOOL (or branch per tool if flags differ, e.g. --platform).

Copilot · 2026-03-07T18:39:01Z

+        local tmp_tar="/tmp/wva-image-$(date +%s).tar"
+        $CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
+        kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
+        rm -f "$tmp_tar"


The temporary tar archive is only removed on the success path. If save or kind load image-archive fails, the script can leak files in /tmp. Use mktemp and a trap to ensure the archive is cleaned up on all exits (including errors).

Suggested change

local tmp_tar="/tmp/wva-image-$(date +%s).tar"

$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"

kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"

rm -f "$tmp_tar"

local tmp_tar

tmp_tar="$(mktemp /tmp/wva-image-XXXXXX.tar)"

trap 'rm -f "$tmp_tar"' EXIT INT TERM

$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"

kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"

rm -f "$tmp_tar"

trap - EXIT INT TERM

Copilot · 2026-03-07T18:39:01Z


+# Container tool (docker or podman can pass from Makefile)
+CONTAINER_TOOL=${CONTAINER_TOOL:-docker}
+


CONTAINER_TOOL is used as a command later (e.g. $CONTAINER_TOOL image inspect/save). To avoid confusing failures (or accidental execution of unexpected commands), validate the value early (e.g. only allow docker or podman) and error out with a clear message if it’s unsupported or not installed.

Suggested change

# Validate container tool early to avoid confusing failures or unexpected commands

case "$CONTAINER_TOOL" in

docker|podman)

if ! command -v "$CONTAINER_TOOL" >/dev/null 2>&1; then

echo -e "${RED}Error:${NC} Container tool '$CONTAINER_TOOL' is not installed or not in PATH." >&2

exit 1

fi

;;

*)

echo -e "${RED}Error:${NC} Unsupported CONTAINER_TOOL '$CONTAINER_TOOL'. Only 'docker' or 'podman' are supported." >&2

exit 1

;;

esac

Copilot · 2026-03-07T18:39:02Z

+            rm -rf "$LLM_D_PROJECT"
+        else
+            log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
+        fi
+    elif [ -d "$LLM_D_PROJECT" ]; then
+        log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
+        rm -rf "$LLM_D_PROJECT"


This re-clone logic is destructive: if $LLM_D_PROJECT exists at a different tag (or git describe --exact-match returns unknown), the script unconditionally rm -rfs the directory. That can delete local changes/worktrees unexpectedly. Consider a safer approach (e.g. git fetch --tags + git checkout $LLM_D_RELEASE, or gate deletion behind an explicit LLM_D_FORCE_RECLONE=true / confirmation).

Suggested change

rm -rf "$LLM_D_PROJECT"

else

log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"

fi

elif [ -d "$LLM_D_PROJECT" ]; then

log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"

rm -rf "$LLM_D_PROJECT"

if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then

log_warning "LLM_D_FORCE_RECLONE=true, removing existing directory '$LLM_D_PROJECT' and re-cloning"

rm -rf "$LLM_D_PROJECT"

else

log_info "Attempting to update existing repository in-place to release '$LLM_D_RELEASE'"

(

cd "$LLM_D_PROJECT"

git fetch --tags origin >/dev/null 2>&1

if git rev-parse "$LLM_D_RELEASE" >/dev/null 2>&1; then

git checkout "$LLM_D_RELEASE" >/dev/null 2>&1

log_success "Updated existing $LLM_D_PROJECT repository to $LLM_D_RELEASE"

else

log_error "Release '$LLM_D_RELEASE' not found in existing $LLM_D_PROJECT repo. Set LLM_D_FORCE_RECLONE=true to allow deleting and re-cloning, or update the repository manually."

fi

)

fi

else

log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"

fi

elif [ -d "$LLM_D_PROJECT" ]; then

if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then

log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it because LLM_D_FORCE_RECLONE=true"

rm -rf "$LLM_D_PROJECT"

else

log_error "$LLM_D_PROJECT exists but is not a git repository. Move or remove it, or set LLM_D_FORCE_RECLONE=true to allow deletion."

fi

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Copilot · 2026-03-07T19:20:52Z

-        if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
-            log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
+        if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
+            log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"


The suggested fix command in this error message (make $CONTAINER_TOOL-build ...) doesn’t match the Makefile: there is a docker-build target that uses $(CONTAINER_TOOL) internally, but no podman-build target. Update the message to point to the actual build invocation (e.g., make docker-build CONTAINER_TOOL=$CONTAINER_TOOL ...).

Suggested change

log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"

log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"

Copilot · 2026-03-07T19:20:53Z

+    # Clone llm-d repo if not exists, or re-clone if version mismatch detected
+    if [ -d "$LLM_D_PROJECT/.git" ]; then
+        # Check current version (try tag first, then branch)
+        CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
+        if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then
+            log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"
+            rm -rf "$LLM_D_PROJECT"
+        else
+            log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
+        fi
+    elif [ -d "$LLM_D_PROJECT" ]; then
+        log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
+        rm -rf "$LLM_D_PROJECT"


Automatically rm -rf-ing an existing $LLM_D_PROJECT directory on version mismatch can unexpectedly delete local changes (or other content if the directory is user-managed). Consider a safer approach: git fetch --tags + git checkout "$LLM_D_RELEASE", or cloning into a versioned directory (e.g., llm-d-$LLM_D_RELEASE) and reusing it when present.

Suggested change

# Clone llm-d repo if not exists, or re-clone if version mismatch detected

if [ -d "$LLM_D_PROJECT/.git" ]; then

# Check current version (try tag first, then branch)

CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")

if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then

log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"

rm -rf "$LLM_D_PROJECT"

else

log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"

fi

elif [ -d "$LLM_D_PROJECT" ]; then

log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"

rm -rf "$LLM_D_PROJECT"

# Clone llm-d repo if not exists, or update in-place if version mismatch detected

if [ -d "$LLM_D_PROJECT/.git" ]; then

# Check current version (try tag first, then branch)

CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")

if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then

log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"

log_info "Attempting to fetch and check out release '$LLM_D_RELEASE' in existing repository"

if ! (cd "$LLM_D_PROJECT" && git fetch --tags origin && git fetch origin "$LLM_D_RELEASE" && git checkout "$LLM_D_RELEASE"); then

log_error "Failed to switch $LLM_D_PROJECT to release '$LLM_D_RELEASE'. Please clean or backup '$LLM_D_PROJECT' and try again, or set LLM_D_PROJECT to a different path."

exit 1

fi

else

log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"

fi

elif [ -d "$LLM_D_PROJECT" ]; then

log_error "$LLM_D_PROJECT exists but is not a git repository. Please remove or move this directory, or set LLM_D_PROJECT to a different path."

exit 1

Copilot · 2026-03-07T19:20:53Z

+
    if [ ! -d "$LLM_D_PROJECT" ]; then
        log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
        git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null


git clone arguments here are unquoted. Since these values come from environment variables, whitespace or unexpected characters in $LLM_D_RELEASE, $LLM_D_OWNER, or $LLM_D_PROJECT can break the command (and make debugging harder). Quote the variables and the URL/path arguments.

Suggested change

git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null

git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" &> /dev/null

Copilot · 2026-03-07T19:20:53Z

+        # Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)
+        MODEL_ID_SANITIZED=$(echo "$MODEL_ID" | tr '/' '-')


MODEL_ID_SANITIZED only replaces / with -, but Kubernetes label values must be <=63 chars and match the label-value character rules. Many HuggingFace model IDs can exceed 63 chars or include other invalid characters, causing the deployment/helmfile apply to fail. Consider a more robust sanitization (replace any invalid chars, trim leading/trailing non-alphanumerics, and truncate to 63).

Suggested change

# Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)

MODEL_ID_SANITIZED=$(echo "$MODEL_ID" | tr '/' '-')

# Sanitize model name for k8s label:

# - replace '/' and whitespace with '-'

# - replace any remaining invalid chars with '-'

# - trim leading/trailing non-alphanumerics

# - truncate to 63 characters and ensure it ends with an alphanumeric

MODEL_ID_SANITIZED=$(echo "$MODEL_ID" \

| tr '/[:space:]' '-' \

| sed 's/[^A-Za-z0-9_.-]/-/g')

# Trim leading non-alphanumeric characters

MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED##[^A-Za-z0-9]*}

# Trim trailing non-alphanumeric characters

MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED%%[^A-Za-z0-9]*}

# Truncate to 63 characters (Kubernetes label value max length)

MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED:0:63}

# After truncation, ensure it does not end with non-alphanumeric characters

MODEL_ID_SANITIZED=$(echo "$MODEL_ID_SANITIZED" | sed 's/[^A-Za-z0-9]*$//')

Copilot · 2026-03-07T19:20:53Z

+| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
+| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
+| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
+| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |
+
+**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` automatically configures all llm-d component images to use the same release version. This ensures version consistency across the llm-d inference scheduler and simulator. Individual image variables can override this if needed.


The docs imply LLM_D_RELEASE “controls all llm-d component images”, but in the scripts some llm-d images are still configured independently (e.g., the inference-sim decode/prefill image is set via LLM_D_INFERENCE_SIM_IMG_REPO/LLM_D_INFERENCE_SIM_IMG_TAG, which defaults to latest in the kind-emulator env script). Consider narrowing the wording to the images that are actually tied to LLM_D_RELEASE, or updating the scripts so the simulator tag defaults to LLM_D_RELEASE as well.

Suggested change

| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |

| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |

| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |

**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` automatically configures all llm-d component images to use the same release version. This ensures version consistency across the llm-d inference scheduler and simulator. Individual image variables can override this if needed.

| `LLM_D_RELEASE` | llm-d release version (controls default tags for core llm-d component images used by WVA) | `v0.5.1` |

| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |

| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |

**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` configures the default release tag for the llm-d component images used by this deployment (specifically, the inference scheduler and simulator images above). This helps maintain version consistency across these core components. Additional llm-d images defined in platform-specific scripts may use their own configuration variables and are not automatically controlled by `LLM_D_RELEASE`, but individual image variables can always override the defaults if needed.

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

deploy/kind-emulator/install.sh:200

In load_image, the pull path still hard-codes docker pull / docker image inspect. When CONTAINER_TOOL=podman (or docker isn’t installed), this will fail even though the script advertises Podman support. Use $CONTAINER_TOOL consistently for pull/inspect (and note that podman pull doesn’t support --platform in the same way as docker, so you may need a tool-specific branch).

        # Pull a single-platform image so kind load does not hit "content digest not found"
        # (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
        local platform="${KIND_IMAGE_PLATFORM:-}"
        if [ -z "$platform" ]; then
            case "$(uname -m)" in
                aarch64|arm64) platform="linux/arm64" ;;
                *) platform="linux/amd64" ;;
            esac
        fi
        log_info "Pulling single-platform image for KIND (platform=$platform) to avoid load errors..."
        if ! docker pull --platform "$platform" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"; then
            log_warning "Failed to pull image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' (platform=$platform)"
            log_info "Attempting to use existing local image..."
            if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
                log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build or pull the image"
                exit 1

Copilot · 2026-03-08T19:16:20Z

 	)

-	// Default to v1alpha2 group if empty
+	// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty


The comment here is grammatically unclear ("already by default use v1 than v1alpha2") and makes the reasoning harder to follow. Please reword it (e.g., "Default to v1 if empty, since llm-d defaults to v1 now").

Suggested change

// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty

// Default to v1 if empty, since llm-d now defaults to v1 instead of v1alpha2.

Copilot · 2026-03-08T19:16:21Z

+$CONTAINER_TOOL exec -ti kind-control-plane umount -R /proc/driver/nvidia

 # According to https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
 echo "> Adding/updateding the NVIDIA Helm repository"


Spelling in the log message: updateding → updating (makes CLI output look unpolished).

Suggested change

echo "> Adding/updateding the NVIDIA Helm repository"

echo "> Adding/updating the NVIDIA Helm repository"

Copilot · 2026-03-08T19:16:21Z

+        if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
+            log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"


The git clone error check is broken because the command is piped into grep -v. With set -o pipefail, the pipeline’s exit status can become grep’s, and a successful clone that only prints “Cloning into …” will likely make grep -v exit 1, causing the script to treat the clone as failed. Avoid piping the clone output into grep for suppression; redirect output instead (or capture output without affecting the exit code).

Suggested change

if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then

log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"

if ! clone_output=$(git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" 2>&1); then

log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"

printf '%s\n' "$clone_output"

Copilot · 2026-03-08T19:16:21Z

-# TODO: update once the llm-d-inference-scheduler v0.5.0 is released
-LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0-rc.1"}
+LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE"}
+LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}


This script now defines LLM_D_INFERENCE_SIM_IMG based on LLM_D_RELEASE, but later it still patches the llm-d values using LLM_D_INFERENCE_SIM_IMG_REPO:$LLM_D_INFERENCE_SIM_IMG_TAG (and LLM_D_INFERENCE_SIM_IMG_REPO is not defined anywhere in this script). To avoid producing an invalid image reference, either define the repo/tag variables here or standardize on a single variable (LLM_D_INFERENCE_SIM_IMG) and update the downstream patching logic accordingly.

Suggested change

LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}

LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}

LLM_D_INFERENCE_SIM_IMG_REPO=${LLM_D_INFERENCE_SIM_IMG_REPO:-"ghcr.io/llm-d/llm-d-inference-sim"}

LLM_D_INFERENCE_SIM_IMG_TAG=${LLM_D_INFERENCE_SIM_IMG_TAG:-"$LLM_D_RELEASE"}

Copilot · 2026-03-08T19:16:21Z

+elif [ "$POOL_GROUP" = "inference.networking.x-k8s.io" ]; then
+    POOL_VERSION="v1alpha2"
+else
+    log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"


log_error is called here before it is defined later in the script. If an invalid POOL_GROUP is provided, this will result in log_error: command not found and a less clear failure mode. Consider moving the helper function definitions above this validation block, or replace this call with a plain echo/printf + exit 1 at this early point in the script.

Suggested change

log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"

echo -e "${RED}Error:${NC} Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" >&2

Copilot · 2026-03-08T19:16:22Z


    # Create InferenceModel for second model (maps model name to pool)
-    # Note: InferenceModel CRD may not be available in all environments
+    # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead


Typo in the TODO comment: InfereceModelRewrite is misspelled, which makes it harder to search for the intended resource/name later. Please correct it to InferenceModelRewrite.

Suggested change

# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead

# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

Copilot

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 8 comments.

Copilot · 2026-03-09T13:54:55Z

+        if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
+            log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
+            return 1
+        fi


The git clone command is piped to grep -v, which changes the pipeline exit code semantics: if clone output is fully filtered (or empty), grep can exit non-zero and make a successful clone look like a failure. Prefer checking git clone’s exit status directly (e.g., run it normally/with -q, redirecting stdout as needed while preserving the clone exit code).

Copilot · 2026-03-09T13:54:56Z


    # Create InferenceModel for second model (maps model name to pool)
-    # Note: InferenceModel CRD may not be available in all environments
+    # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead


Typo in comment: InfereceModelRewrite should be InferenceModelRewrite (and/or clarify the intended resource name).

Copilot · 2026-03-09T13:54:56Z

+        # (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
        local platform="${KIND_IMAGE_PLATFORM:-}"
        if [ -z "$platform" ]; then
            case "$(uname -m)" in


In load_image(), Podman support is only partially applied: this hunk updates the comment to mention $CONTAINER_TOOL, but the subsequent pull/inspect logic still uses docker pull / docker image inspect. With CONTAINER_TOOL=podman and WVA_IMAGE_PULL_POLICY!=IfNotPresent, the script will still require Docker. Suggest switching those commands to $CONTAINER_TOOL (or branching like the Kind load step) so the pull path works with Podman too.

Copilot · 2026-03-09T13:54:56Z

+WVA provides a **single consolidated E2E suite** that runs on Kind clusters with emulated GPUs. Tests create VA, HPA, and model services dynamically as part of the test workflow.
+
+> **Note**: E2E tests are only supported on Kind clusters (`ENVIRONMENT=kind-emulator`). The test fixtures and labels are configured specifically for the Kind emulator deployment which uses the `simulated-accelerators` guide from llm-d. Running E2E tests on other environments (OpenShift, generic Kubernetes) is not supported for now.

 - **Location**: `test/e2e/`
- **Environments**: Kind (emulated), OpenShift, or generic Kubernetes
+- **Supported Environment**: Kind (emulated GPUs only)


This documentation change says E2E tests are only supported on Kind and that OpenShift/generic Kubernetes is not supported. That conflicts with the Makefile’s stated goal of “environment-agnostic” consolidated E2E targets (which accept ENVIRONMENT=$(ENVIRONMENT)) and the presence of an OpenShift E2E workflow in .github/workflows/ci-e2e-openshift.yaml. Please align the docs with actual supported environments (either restore OpenShift/Kubernetes support in the docs, or update the Makefile/CI if support was intentionally dropped).

Copilot · 2026-03-09T13:54:57Z

+	// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
 	if poolGroup == "" {
-		poolGroup = PoolGroupV1Alpha2
+		poolGroup = PoolGroupV1


The inline comment is hard to read and grammatically incorrect (“llm-d already by default use v1 than v1alpha2”). Please rephrase so it clearly explains why v1 is the default (e.g., “Default to v1 when poolGroup is empty”).

Copilot · 2026-03-09T13:54:57Z

+        if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
+            log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
        else


This error message suggests running make $CONTAINER_TOOL-build ..., but the Makefile target is still named docker-build (it just uses CONTAINER_TOOL internally). Consider updating the example to something that actually works, e.g. make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=....

Copilot · 2026-03-09T13:54:57Z

+        local tmp_tar="/tmp/wva-image-$(date +%s).tar"
+        $CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
+        kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
+        rm -f "$tmp_tar"


For Podman loading, the temp tar path is predictable and won’t be cleaned up if podman save or kind load fails. Prefer using mktemp (or similar) and a trap to ensure the archive is always removed on exit/error.

Copilot · 2026-03-09T13:54:57Z

+				// double ehck flowcontrol is enabled as env var
+				for _, container := range pod.Spec.Containers {


Typo in comment: double ehck → double check.

zdtsw · 2026-03-09T16:22:45Z

/test-e2e-full

shuynh2017 · 2026-03-09T21:13:10Z

@zdtsw one failed test, not sure due to your changes.

shuynh2017 · 2026-03-09T21:13:20Z

/ok-to-test

shuynh2017 · 2026-03-10T15:13:49Z

@lionelvillard @asm582 there are few changes here but they look ok. Pls review.

shuynh2017 · 2026-03-10T15:14:51Z

@zdtsw, pls check one failing test

shuynh2017 · 2026-03-10T16:25:37Z

@zdtsw this may fix the smoke test #870

- use env variable LLM_D_RELEASE to control all image in the deploy/install.sh - clone llm-d to local based on local version if match required release version - use env variable CONTAINER_TOOL to support podmano on fedora - remove/update *ignore files Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- old test was based on v1alph2 of GIE for infpool - new default is on v1 for infpool Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- the label we should use is "llm-d.ai/inference-serving:true" Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- if wva was started before CRD regardless v1 or v1alph2 is installed, it hit empty Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- since sim image has a different release we cannot use same env variable Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- if e2e is only tested on kind with specific guide gaie-sim then make it clear and fix the label value - llm-d.ai/guide:simlulated-accelerators - llm-d.ai/model:random Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

- prolong timeout in test - add INFO to understand why did not find VA Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Copilot · 2026-03-11T08:16:11Z

 | `WVA_IMAGE_PULL_POLICY` | Image pull policy | `Always` |
+| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
+| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
+| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |


The config reference documents LLM_D_INFERENCE_SIM_IMG as the way to override the inference-sim image, but deploy/install.sh updates the primary modelservice values using LLM_D_INFERENCE_SIM_IMG_REPO + LLM_D_INFERENCE_SIM_IMG_TAG (and only uses LLM_D_INFERENCE_SIM_IMG for the optional second model). Either align the script to use LLM_D_INFERENCE_SIM_IMG consistently, or update this documentation to describe the actual override variables that take effect.

Suggested change

| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

| `LLM_D_INFERENCE_SIM_IMG_REPO` | Override primary llm-d inference simulator image repository | `ghcr.io/llm-d/llm-d-inference-sim` |

| `LLM_D_INFERENCE_SIM_IMG_TAG` | Override primary llm-d inference simulator image tag | `$LLM_D_RELEASE` |

| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image for the optional second model | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

Copilot · 2026-03-11T08:16:12Z

 	if !pendingRequestExist {
+		// Log INFO only when queue exists but model doesn't match
+		if queueMetricFound {
+			logger.Info("Scale-from-zero: queue has pending requests but model not matched",
+				"va", va.Name,
+				"vaModelID", va.Spec.ModelID,
+				"queueModels", queueMetricModels)
+		}


The new INFO log inside the 100ms scale-from-zero loop can flood logs when the queue has pending requests for other models (queueMetricFound=true but no match). This effectively reintroduces high-volume logging at INFO level; consider downgrading this to DEBUG, adding rate limiting, or logging only on state transition (e.g., first time mismatch is observed).

Copilot · 2026-03-11T08:16:12Z

+WVA provides a **single consolidated E2E suite** that runs on Kind clusters with emulated GPUs. Tests create VA, HPA, and model services dynamically as part of the test workflow.
+
+> **Note**: E2E tests are only supported on Kind clusters (`ENVIRONMENT=kind-emulator`). The test fixtures and labels are configured specifically for the Kind emulator deployment which uses the `simulated-accelerators` guide from llm-d. Running E2E tests on other environments (OpenShift, generic Kubernetes) is not supported for now.

 - **Location**: `test/e2e/`
- **Environments**: Kind (emulated), OpenShift, or generic Kubernetes
+- **Supported Environment**: Kind (emulated GPUs only)
 - **Tiers**: Smoke (~5–10 min) for PRs; full suite (~15–25 min) for comprehensive validation


The note says E2E tests are only supported on Kind, but this doc section still includes OpenShift-specific quick start instructions and later presents an OpenShift E2E column in the comparison matrix. Either remove/update those OpenShift references, or reword this note to clarify which suites run on which environments (e.g., Kind-only for test/e2e/ but separate OpenShift suites exist).

Copilot · 2026-03-11T08:16:12Z

    if [ ! -d "$LLM_D_PROJECT" ]; then
        log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
-        git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
-    else
-        log_warning "$LLM_D_PROJECT directory already exists, skipping clone"
+        if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
+            log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
+            return 1
+        fi
+        log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"


The git clone success/failure check is unreliable because the pipeline exit status comes from grep, not git clone. If git clone succeeds but only prints “Cloning into…”, grep -v will return exit code 1 (no output) and this branch will incorrectly treat the clone as failed. Capture git clone's exit code (e.g., avoid a pipe, or use PIPESTATUS) and only filter output for logging after verifying success.

Copilot · 2026-03-11T08:16:12Z

+				if !isReady {
+					continue
+				}
+				// double ehck flowcontrol is enabled as env var


Typo in comment: "double ehck" → "double check".

Suggested change

// double ehck flowcontrol is enabled as env var

// double check flowcontrol is enabled as env var

Copilot · 2026-03-11T08:16:12Z


    # Create InferenceModel for second model (maps model name to pool)
-    # Note: InferenceModel CRD may not be available in all environments
+    # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead


Typo in TODO comment: "InfereceModelRewrite" should be "InferenceModelRewrite".

Suggested change

# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead

# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

github-actions · 2026-04-03T01:35:35Z

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Copilot AI review requested due to automatic review settings March 7, 2026 18:34

Copilot started reviewing on behalf of zdtsw March 7, 2026 18:34 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 7, 2026 19:14

Copilot started reviewing on behalf of zdtsw March 7, 2026 19:15 View session

Copilot AI reviewed Mar 7, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 8, 2026 19:07

Copilot started reviewing on behalf of zdtsw March 8, 2026 19:08 View session

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 9, 2026 13:46

Copilot started reviewing on behalf of zdtsw March 9, 2026 13:46 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

zdtsw added 10 commits March 11, 2026 09:05

fix: k8s label cannot have "/" sanitize MODLE_ID

e421b6a

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: github action e2e which uses "LLM_D_RELEASE:main"

2ce9b9d

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: API bump

344b3b0

- old test was based on v1alph2 of GIE for infpool - new default is on v1 for infpool Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: wrong lable for mistmatch

47599cf

- the label we should use is "llm-d.ai/inference-serving:true" Signed-off-by: Wen Zhou <wenzhou@redhat.com>

update: add restart wva pod to ensure CRD is watched in cache

a8f8a5d

- if wva was started before CRD regardless v1 or v1alph2 is installed, it hit empty Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: set env variable LLM_D_INFERENCE_SIM_IMAGE_TAG to 0.7.1(latest)

475cd0a

- since sim image has a different release we cannot use same env variable Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: go lint + test with wrong poolName

d7592bd

- if e2e is only tested on kind with specific guide gaie-sim then make it clear and fix the label value - llm-d.ai/guide:simlulated-accelerators - llm-d.ai/model:random Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: to get flowcontrol in place for EPP pod

fcc2bc3

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

debug:

656490a

- prolong timeout in test - add INFO to understand why did not find VA Signed-off-by: Wen Zhou <wenzhou@redhat.com>

Copilot AI review requested due to automatic review settings March 11, 2026 08:06

zdtsw force-pushed the chore_fix_2 branch from e01dfa6 to 656490a Compare March 11, 2026 08:06

Copilot started reviewing on behalf of zdtsw March 11, 2026 08:06 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

update: add more debug to understand why not get VA replica

da00202

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

github-actions bot added the lifecycle/stale label Apr 3, 2026

github-actions bot added lifecycle/rotten and removed lifecycle/stale labels Apr 10, 2026

	git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
	git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT > /dev/null


		# Container tool (docker or podman)
		CONTAINER_TOOL=${CONTAINER_TOOL:-docker}

+# Validate container tool selection early to avoid confusing failures and unexpected commands
+case "${CONTAINER_TOOL}" in
+  docker|podman)
+    if ! command -v "${CONTAINER_TOOL}" >/dev/null 2>&1; then
+      echo "Error: CONTAINER_TOOL is set to '${CONTAINER_TOOL}', but the command was not found in PATH." >&2
+      exit 1
+    fi
+    ;;
+  *)
+    echo "Error: CONTAINER_TOOL must be either 'docker' or 'podman', but is set to '${CONTAINER_TOOL}'." >&2
+    exit 1
+    ;;
+esac


		# Container tool (docker or podman can pass from Makefile)
		CONTAINER_TOOL=${CONTAINER_TOOL:-docker}

+# Validate container tool early to avoid confusing failures or unexpected commands
+case "$CONTAINER_TOOL" in
+  docker|podman)
+    if ! command -v "$CONTAINER_TOOL" >/dev/null 2>&1; then
+      echo -e "${RED}Error:${NC} Container tool '$CONTAINER_TOOL' is not installed or not in PATH." >&2
+      exit 1
+    fi
+    ;;
+  *)
+    echo -e "${RED}Error:${NC} Unsupported CONTAINER_TOOL '$CONTAINER_TOOL'. Only 'docker' or 'podman' are supported." >&2
+    exit 1
+    ;;
+esac

-            rm -rf "$LLM_D_PROJECT"
-        else
-            log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
-        fi
-    elif [ -d "$LLM_D_PROJECT" ]; then
-        log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
-        rm -rf "$LLM_D_PROJECT"
+            if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then
+                log_warning "LLM_D_FORCE_RECLONE=true, removing existing directory '$LLM_D_PROJECT' and re-cloning"
+                rm -rf "$LLM_D_PROJECT"
+            else
+                log_info "Attempting to update existing repository in-place to release '$LLM_D_RELEASE'"
+                (
+                    cd "$LLM_D_PROJECT"
+                    git fetch --tags origin >/dev/null 2>&1
+                    if git rev-parse "$LLM_D_RELEASE" >/dev/null 2>&1; then
+                        git checkout "$LLM_D_RELEASE" >/dev/null 2>&1
+                        log_success "Updated existing $LLM_D_PROJECT repository to $LLM_D_RELEASE"
+                    else
+                        log_error "Release '$LLM_D_RELEASE' not found in existing $LLM_D_PROJECT repo. Set LLM_D_FORCE_RECLONE=true to allow deleting and re-cloning, or update the repository manually."
+                    fi
+                )
+            fi
+        else
+            log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
+        fi
+    elif [ -d "$LLM_D_PROJECT" ]; then
+        if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then
+            log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it because LLM_D_FORCE_RECLONE=true"
+            rm -rf "$LLM_D_PROJECT"
+        else
+            log_error "$LLM_D_PROJECT exists but is not a git repository. Move or remove it, or set LLM_D_FORCE_RECLONE=true to allow deletion."
+        fi

	log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
	log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"

	git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
	git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" &> /dev/null

		# Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)
		MODEL_ID_SANITIZED=$(echo "$MODEL_ID" \| tr '/' '-')

-        # Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)
-        MODEL_ID_SANITIZED=$(echo "$MODEL_ID" | tr '/' '-')
+        # Sanitize model name for k8s label:
+        # - replace '/' and whitespace with '-'
+        # - replace any remaining invalid chars with '-'
+        # - trim leading/trailing non-alphanumerics
+        # - truncate to 63 characters and ensure it ends with an alphanumeric
+        MODEL_ID_SANITIZED=$(echo "$MODEL_ID" \
+            | tr '/[:space:]' '-' \
+            | sed 's/[^A-Za-z0-9_.-]/-/g')
+        # Trim leading non-alphanumeric characters
+        MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED##[^A-Za-z0-9]*}
+        # Trim trailing non-alphanumeric characters
+        MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED%%[^A-Za-z0-9]*}
+        # Truncate to 63 characters (Kubernetes label value max length)
+        MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED:0:63}
+        # After truncation, ensure it does not end with non-alphanumeric characters
+        MODEL_ID_SANITIZED=$(echo "$MODEL_ID_SANITIZED" | sed 's/[^A-Za-z0-9]*$//')

	// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
	// Default to v1 if empty, since llm-d now defaults to v1 instead of v1alpha2.

	echo "> Adding/updateding the NVIDIA Helm repository"
	echo "> Adding/updating the NVIDIA Helm repository"

		if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 \| grep -v "Cloning into"; then
		log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"

	log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"
	echo -e "${RED}Error:${NC} Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" >&2

	# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
	# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

		// double ehck flowcontrol is enabled as env var
		for _, container := range pod.Spec.Containers {

-| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
+| `LLM_D_INFERENCE_SIM_IMG_REPO` | Override primary llm-d inference simulator image repository | `ghcr.io/llm-d/llm-d-inference-sim` |
+| `LLM_D_INFERENCE_SIM_IMG_TAG` | Override primary llm-d inference simulator image tag | `$LLM_D_RELEASE` |
+| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image for the optional second model | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

	// double ehck flowcontrol is enabled as env var
	// double check flowcontrol is enabled as env var

Conversation

zdtsw commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Notes

Test

Uh oh!

zdtsw commented Mar 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

zdtsw commented Mar 7, 2026 •

edited

Loading