Skip to content

feat: add support for podman + bump version for llm-d and all images#857

Open
zdtsw wants to merge 11 commits intollm-d:mainfrom
zdtsw:chore_fix_2
Open

feat: add support for podman + bump version for llm-d and all images#857
zdtsw wants to merge 11 commits intollm-d:mainfrom
zdtsw:chore_fix_2

Conversation

@zdtsw
Copy link
Copy Markdown
Collaborator

@zdtsw zdtsw commented Mar 7, 2026

Changes

  • use env variable LLM_D_RELEASE to control all image in the deploy/install.sh
  • clone llm-d to local based on local version if match required release version
  • use env variable CONTAINER_TOOL to support podmano on fedora
  • remove/update *ignore files
  • update default infpool API versoin to v1
  • fix e2e test on scale-from-zero: github action or make target only runs on "kind", model name is wrong need to be randon, guide name is wrong need to match what is set in the gaie-sim, add restart deployment after FC is enabled, add restart after API(infpool CRD is applied in the cluster)

Notes

without the change ,it runs with

NAME        NAMESPACE   CHART                                                                        VERSION   DURATION
infra-sim   llm-d-sim   llm-d-infra/llm-d-infra                                                      v1.3.3          0s
gaie-sim    llm-d-sim   oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool   v1.0.1          1s
ms-sim      llm-d-sim   llm-d-modelservice/llm-d-modelservice                                        v0.2.11         1s

Test

go test -v -timeout 20m -ginkgo.focus="Scale-From-Zero Feature" -ginkgo.v .
=== RUN   TestE2E
Running Suite: E2E Test Suite - /home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e
============================================================================================
Random Seed: 1773071948

Will run 6 of 48 specs
------------------------------
[BeforeSuite] 
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/suite_test.go:49
  STEP: Loading configuration from environment @ 03/09/26 16:59:08.259
  === E2E Test Configuration ===
  Environment: kind-emulator
  WVA Namespace: workload-variant-autoscaler-system
  LLMD Namespace: llm-d-sim
  Use Simulator: true
  Scale-to-Zero Enabled: false
  Scaler Backend: prometheus-adapter
  Model ID: unsloth/Meta-Llama-3.1-8B
  Load Strategy: synthetic
  ==============================

  STEP: Initializing Kubernetes client @ 03/09/26 16:59:08.259
  STEP: Verifying WVA controller is running @ 03/09/26 16:59:08.263
  STEP: Verifying llm-d infrastructure @ 03/09/26 16:59:08.271
  STEP: Verifying Prometheus is available @ 03/09/26 16:59:08.272
  STEP: Restarting prometheus-adapter pods @ 03/09/26 16:59:08.274
  Deleted prometheus-adapter pod: prometheus-adapter-6748c5c5c6-b8q64
  Deleted prometheus-adapter pod: prometheus-adapter-6748c5c5c6-d8js6
  STEP: Waiting for prometheus-adapter pods to be ready @ 03/09/26 16:59:08.284
  prometheus-adapter pods restarted and ready
  BeforeSuite completed successfully - infrastructure ready
[BeforeSuite] PASSED [0.031 seconds]
------------------------------
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
Scale-From-Zero Feature Initial state verification should have VariantAutoscaling resource created [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:219
  STEP: Waiting for InferencePool to be reconciled (allows time for controller to register it in datastore) @ 03/09/26 16:59:08.29
  Looking for EPP service: gaie-sim-epp in namespace: llm-d-sim
  STEP: Creating model service deployment with 0 initial replicas @ 03/09/26 16:59:13.301
  STEP: Scaling deployment to 0 replicas @ 03/09/26 16:59:13.309
  STEP: Creating service to expose model server @ 03/09/26 16:59:13.319
  STEP: Creating ServiceMonitor for metrics scraping @ 03/09/26 16:59:13.326
  STEP: Verifying deployment is at 0 replicas @ 03/09/26 16:59:13.335
  STEP: Creating VariantAutoscaling resource @ 03/09/26 16:59:18.341
  STEP: Creating scaler with minReplicas=0 (HPA or ScaledObject per backend) @ 03/09/26 16:59:18.35
  STEP: Waiting for VA to be ready and InferencePool to be available in datastore @ 03/09/26 16:59:18.354
  Scale-from-zero test setup complete with deployment at 0 replicas
  STEP: Verifying VariantAutoscaling exists @ 03/09/26 17:00:18.383
  VariantAutoscaling resource verified: scale-from-zero-va
• [70.095 seconds]
------------------------------
Scale-From-Zero Feature Initial state verification should verify deployment starts at zero replicas [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:232
  STEP: Checking deployment has 0 replicas @ 03/09/26 17:00:18.385
  Deployment verified at 0 replicas
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Initial state verification should have scaler configured with minReplicas=0 [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:246
  STEP: Verifying HPA allows scale-to-zero @ 03/09/26 17:00:18.386
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should detect pending requests and trigger scale-from-zero [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:284
  STEP: Discovering inference gateway service @ 03/09/26 17:00:18.387
  Found inference gateway service: infra-sim-inference-gateway-istio
  STEP: Creating a job to send requests while deployment is at zero @ 03/09/26 17:00:18.389
  Created scale-from-zero trigger job: scale-from-zero-trigger-1773072018
  STEP: Waiting for job pod to be running and sending requests @ 03/09/26 17:00:18.397
  Job pod is running and sending requests
  STEP: Waiting for requests to queue up in EPP flow control queue @ 03/09/26 17:00:23.405
  STEP: Monitoring VariantAutoscaling for scale-from-zero decision @ 03/09/26 17:00:33.414
  VA DesiredOptimizedAlloc.NumReplicas: 1 (waiting for > 0)
    MetricsAvailable: True/ScaleFromZero (Scaled from zero due to pending requests)
  Scale-from-zero engine detected pending requests and recommended scale-up
• [15.028 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should scale deployment up from zero [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:363
  STEP: Monitoring deployment for actual scale-up from zero @ 03/09/26 17:00:33.415
  Current replicas: 1, ready: 1 (waiting for > 0)
  Deployment successfully scaled up from zero
• [0.001 seconds]
------------------------------
Scale-From-Zero Feature Scale-from-zero with pending requests should successfully process requests after scaling up [smoke, full]
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/scale_from_zero_test.go:388
  STEP: Verifying the trigger job completes successfully @ 03/09/26 17:00:33.417
  Requests processed successfully after scale-from-zero
  STEP: Cleaning up trigger job @ 03/09/26 17:01:33.448
  STEP: Cleaning up scale-from-zero test resources @ 03/09/26 17:01:33.452
  Successfully deleted ServiceMonitor scale-from-zero-ms-monitor
• [60.059 seconds]
------------------------------
SSSSSSS
------------------------------
[AfterSuite] 
/home/wenzhou/tmp/llm-d-workload-variant-autoscaler/test/e2e/suite_test.go:215
  STEP: Cleaning up any leftover test resources @ 03/09/26 17:01:33.476
  Cleaning up test resources...
  Keeping Kind cluster for debugging (set DELETE_CLUSTER=true to delete)
[AfterSuite] PASSED [0.017 seconds]
------------------------------

Ran 6 of 48 Specs in 145.235 seconds
SUCCESS! -- 6 Passed | 0 Failed | 0 Pending | 42 Skipped
--- PASS: TestE2E (145.24s)
PASS
ok      github.com/llm-d/llm-d-workload-variant-autoscaler/test/e2e     145.256s

Copilot AI review requested due to automatic review settings March 7, 2026 18:34
@zdtsw
Copy link
Copy Markdown
Collaborator Author

zdtsw commented Mar 7, 2026

cc @shuynh2017

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurability to deployment scripts to better support local development environments (notably Podman) and to centralize llm-d version/image selection so deployments stay consistent across components.

Changes:

  • Introduces LLM_D_RELEASE-driven defaults for llm-d component images in deploy/install.sh and updates the default release.
  • Adds CONTAINER_TOOL support (docker vs podman) for Kind-related image handling and container exec usage.
  • Updates docs/ignore files to reflect the new workflow and local repo cloning behavior.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
deploy/kubernetes/create-kind-cluster-with-nvidia.sh Parameterizes container exec calls via CONTAINER_TOOL.
deploy/kind-emulator/install.sh Adds CONTAINER_TOOL and a podman-specific Kind image loading path.
deploy/install.sh Bumps default llm-d release, centralizes image defaults, and changes llm-d repo clone behavior.
deploy/README.md Documents Podman support and centralized LLM_D_RELEASE configuration.
Makefile Passes CONTAINER_TOOL through to deploy script in e2e infra target.
.gitignore Removes outdated ignore entries for old llm-d directory names.
.dockerignore Excludes llm-d/ from Docker build context.

Comment thread deploy/install.sh Outdated

if [ ! -d "$LLM_D_PROJECT" ]; then
log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git clone ... &> /dev/null suppresses all clone output. When the clone fails (e.g., tag doesn’t exist, network/auth issues), the script will exit due to set -e but without any actionable error details. Prefer letting stderr through or capturing it and emitting via log_error for easier troubleshooting.

Suggested change
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT > /dev/null

Copilot uses AI. Check for mistakes.

# Container tool (docker or podman)
CONTAINER_TOOL=${CONTAINER_TOOL:-docker}

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CONTAINER_TOOL is executed directly (e.g. $CONTAINER_TOOL exec ...), validate it up front (only allow docker/podman) and fail fast with a clear message if the command is not found. This prevents hard-to-debug failures later and avoids running unexpected commands when the env var is mis-set.

Suggested change
# Validate container tool selection early to avoid confusing failures and unexpected commands
case "${CONTAINER_TOOL}" in
docker|podman)
if ! command -v "${CONTAINER_TOOL}" >/dev/null 2>&1; then
echo "Error: CONTAINER_TOOL is set to '${CONTAINER_TOOL}', but the command was not found in PATH." >&2
exit 1
fi
;;
*)
echo "Error: CONTAINER_TOOL must be either 'docker' or 'podman', but is set to '${CONTAINER_TOOL}'." >&2
exit 1
;;
esac

Copilot uses AI. Check for mistakes.
Comment on lines 185 to 189
# Pull a single-platform image so kind load does not hit "content digest not found"
# (multi-platform manifests can reference blobs that are not in the docker save stream).
# (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
local platform="${KIND_IMAGE_PLATFORM:-}"
if [ -z "$platform" ]; then
case "$(uname -m)" in
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block introduces $CONTAINER_TOOL in the comment/logic, but the subsequent image pull/inspect in this function still uses hard-coded docker commands. With CONTAINER_TOOL=podman, the pull path will fail. Please switch the pull/inspect commands to use $CONTAINER_TOOL (or branch per tool if flags differ, e.g. --platform).

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +214
local tmp_tar="/tmp/wva-image-$(date +%s).tar"
$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
rm -f "$tmp_tar"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The temporary tar archive is only removed on the success path. If save or kind load image-archive fails, the script can leak files in /tmp. Use mktemp and a trap to ensure the archive is cleaned up on all exits (including errors).

Suggested change
local tmp_tar="/tmp/wva-image-$(date +%s).tar"
$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
rm -f "$tmp_tar"
local tmp_tar
tmp_tar="$(mktemp /tmp/wva-image-XXXXXX.tar)"
trap 'rm -f "$tmp_tar"' EXIT INT TERM
$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
rm -f "$tmp_tar"
trap - EXIT INT TERM

Copilot uses AI. Check for mistakes.

# Container tool (docker or podman can pass from Makefile)
CONTAINER_TOOL=${CONTAINER_TOOL:-docker}

Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTAINER_TOOL is used as a command later (e.g. $CONTAINER_TOOL image inspect/save). To avoid confusing failures (or accidental execution of unexpected commands), validate the value early (e.g. only allow docker or podman) and error out with a clear message if it’s unsupported or not installed.

Suggested change
# Validate container tool early to avoid confusing failures or unexpected commands
case "$CONTAINER_TOOL" in
docker|podman)
if ! command -v "$CONTAINER_TOOL" >/dev/null 2>&1; then
echo -e "${RED}Error:${NC} Container tool '$CONTAINER_TOOL' is not installed or not in PATH." >&2
exit 1
fi
;;
*)
echo -e "${RED}Error:${NC} Unsupported CONTAINER_TOOL '$CONTAINER_TOOL'. Only 'docker' or 'podman' are supported." >&2
exit 1
;;
esac

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines +794 to +800
rm -rf "$LLM_D_PROJECT"
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
rm -rf "$LLM_D_PROJECT"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This re-clone logic is destructive: if $LLM_D_PROJECT exists at a different tag (or git describe --exact-match returns unknown), the script unconditionally rm -rfs the directory. That can delete local changes/worktrees unexpectedly. Consider a safer approach (e.g. git fetch --tags + git checkout $LLM_D_RELEASE, or gate deletion behind an explicit LLM_D_FORCE_RECLONE=true / confirmation).

Suggested change
rm -rf "$LLM_D_PROJECT"
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
rm -rf "$LLM_D_PROJECT"
if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then
log_warning "LLM_D_FORCE_RECLONE=true, removing existing directory '$LLM_D_PROJECT' and re-cloning"
rm -rf "$LLM_D_PROJECT"
else
log_info "Attempting to update existing repository in-place to release '$LLM_D_RELEASE'"
(
cd "$LLM_D_PROJECT"
git fetch --tags origin >/dev/null 2>&1
if git rev-parse "$LLM_D_RELEASE" >/dev/null 2>&1; then
git checkout "$LLM_D_RELEASE" >/dev/null 2>&1
log_success "Updated existing $LLM_D_PROJECT repository to $LLM_D_RELEASE"
else
log_error "Release '$LLM_D_RELEASE' not found in existing $LLM_D_PROJECT repo. Set LLM_D_FORCE_RECLONE=true to allow deleting and re-cloning, or update the repository manually."
fi
)
fi
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
if [ "${LLM_D_FORCE_RECLONE:-false}" = "true" ]; then
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it because LLM_D_FORCE_RECLONE=true"
rm -rf "$LLM_D_PROJECT"
else
log_error "$LLM_D_PROJECT exists but is not a git repository. Move or remove it, or set LLM_D_FORCE_RECLONE=true to allow deletion."
fi

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 7, 2026 19:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested fix command in this error message (make $CONTAINER_TOOL-build ...) doesn’t match the Makefile: there is a docker-build target that uses $(CONTAINER_TOOL) internally, but no podman-build target. Update the message to point to the actual build invocation (e.g., make docker-build CONTAINER_TOOL=$CONTAINER_TOOL ...).

Suggested change
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines +789 to +801
# Clone llm-d repo if not exists, or re-clone if version mismatch detected
if [ -d "$LLM_D_PROJECT/.git" ]; then
# Check current version (try tag first, then branch)
CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then
log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"
rm -rf "$LLM_D_PROJECT"
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
rm -rf "$LLM_D_PROJECT"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically rm -rf-ing an existing $LLM_D_PROJECT directory on version mismatch can unexpectedly delete local changes (or other content if the directory is user-managed). Consider a safer approach: git fetch --tags + git checkout "$LLM_D_RELEASE", or cloning into a versioned directory (e.g., llm-d-$LLM_D_RELEASE) and reusing it when present.

Suggested change
# Clone llm-d repo if not exists, or re-clone if version mismatch detected
if [ -d "$LLM_D_PROJECT/.git" ]; then
# Check current version (try tag first, then branch)
CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then
log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"
rm -rf "$LLM_D_PROJECT"
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
log_warning "$LLM_D_PROJECT exists but is not a git repository - removing it"
rm -rf "$LLM_D_PROJECT"
# Clone llm-d repo if not exists, or update in-place if version mismatch detected
if [ -d "$LLM_D_PROJECT/.git" ]; then
# Check current version (try tag first, then branch)
CURRENT_VERSION=$(cd "$LLM_D_PROJECT" && git describe --tags --exact-match 2>/dev/null || git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
if [ "$CURRENT_VERSION" != "$LLM_D_RELEASE" ]; then
log_warning "$LLM_D_PROJECT exists but has version '$CURRENT_VERSION' (expected: $LLM_D_RELEASE)"
log_info "Attempting to fetch and check out release '$LLM_D_RELEASE' in existing repository"
if ! (cd "$LLM_D_PROJECT" && git fetch --tags origin && git fetch origin "$LLM_D_RELEASE" && git checkout "$LLM_D_RELEASE"); then
log_error "Failed to switch $LLM_D_PROJECT to release '$LLM_D_RELEASE'. Please clean or backup '$LLM_D_PROJECT' and try again, or set LLM_D_PROJECT to a different path."
exit 1
fi
else
log_info "$LLM_D_PROJECT directory already exists with correct version ($LLM_D_RELEASE)"
fi
elif [ -d "$LLM_D_PROJECT" ]; then
log_error "$LLM_D_PROJECT exists but is not a git repository. Please remove or move this directory, or set LLM_D_PROJECT to a different path."
exit 1

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh Outdated

if [ ! -d "$LLM_D_PROJECT" ]; then
log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git clone arguments here are unquoted. Since these values come from environment variables, whitespace or unexpected characters in $LLM_D_RELEASE, $LLM_D_OWNER, or $LLM_D_PROJECT can break the command (and make debugging harder). Quote the variables and the URL/path arguments.

Suggested change
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" &> /dev/null

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines +871 to +872
# Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)
MODEL_ID_SANITIZED=$(echo "$MODEL_ID" | tr '/' '-')
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MODEL_ID_SANITIZED only replaces / with -, but Kubernetes label values must be <=63 chars and match the label-value character rules. Many HuggingFace model IDs can exceed 63 chars or include other invalid characters, causing the deployment/helmfile apply to fail. Consider a more robust sanitization (replace any invalid chars, trim leading/trailing non-alphanumerics, and truncate to 63).

Suggested change
# Sanitize model name for k8s label (if MODEL_ID is unsloth/Meta-Llama-3.1-8B, label uses unsloth-Meta-Llama-3.1-8B)
MODEL_ID_SANITIZED=$(echo "$MODEL_ID" | tr '/' '-')
# Sanitize model name for k8s label:
# - replace '/' and whitespace with '-'
# - replace any remaining invalid chars with '-'
# - trim leading/trailing non-alphanumerics
# - truncate to 63 characters and ensure it ends with an alphanumeric
MODEL_ID_SANITIZED=$(echo "$MODEL_ID" \
| tr '/[:space:]' '-' \
| sed 's/[^A-Za-z0-9_.-]/-/g')
# Trim leading non-alphanumeric characters
MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED##[^A-Za-z0-9]*}
# Trim trailing non-alphanumeric characters
MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED%%[^A-Za-z0-9]*}
# Truncate to 63 characters (Kubernetes label value max length)
MODEL_ID_SANITIZED=${MODEL_ID_SANITIZED:0:63}
# After truncation, ensure it does not end with non-alphanumeric characters
MODEL_ID_SANITIZED=$(echo "$MODEL_ID_SANITIZED" | sed 's/[^A-Za-z0-9]*$//')

Copilot uses AI. Check for mistakes.
Comment thread deploy/README.md
Comment on lines +643 to +648
| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |

**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` automatically configures all llm-d component images to use the same release version. This ensures version consistency across the llm-d inference scheduler and simulator. Individual image variables can override this if needed.
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs imply LLM_D_RELEASE “controls all llm-d component images”, but in the scripts some llm-d images are still configured independently (e.g., the inference-sim decode/prefill image is set via LLM_D_INFERENCE_SIM_IMG_REPO/LLM_D_INFERENCE_SIM_IMG_TAG, which defaults to latest in the kind-emulator env script). Consider narrowing the wording to the images that are actually tied to LLM_D_RELEASE, or updating the scripts so the simulator tag defaults to LLM_D_RELEASE as well.

Suggested change
| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |
**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` automatically configures all llm-d component images to use the same release version. This ensures version consistency across the llm-d inference scheduler and simulator. Individual image variables can override this if needed.
| `LLM_D_RELEASE` | llm-d release version (controls default tags for core llm-d component images used by WVA) | `v0.5.1` |
| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
| `CONTAINER_TOOL` | Container tool to use (docker or podman) | `docker` |
**Centralized llm-d Version Management**: Setting `LLM_D_RELEASE` configures the default release tag for the llm-d component images used by this deployment (specifically, the inference scheduler and simulator images above). This helps maintain version consistency across these core components. Additional llm-d images defined in platform-specific scripts may use their own configuration variables and are not automatically controlled by `LLM_D_RELEASE`, but individual image variables can always override the defaults if needed.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 8, 2026 19:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

deploy/kind-emulator/install.sh:200

  • In load_image, the pull path still hard-codes docker pull / docker image inspect. When CONTAINER_TOOL=podman (or docker isn’t installed), this will fail even though the script advertises Podman support. Use $CONTAINER_TOOL consistently for pull/inspect (and note that podman pull doesn’t support --platform in the same way as docker, so you may need a tool-specific branch).
        # Pull a single-platform image so kind load does not hit "content digest not found"
        # (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
        local platform="${KIND_IMAGE_PLATFORM:-}"
        if [ -z "$platform" ]; then
            case "$(uname -m)" in
                aarch64|arm64) platform="linux/arm64" ;;
                *) platform="linux/amd64" ;;
            esac
        fi
        log_info "Pulling single-platform image for KIND (platform=$platform) to avoid load errors..."
        if ! docker pull --platform "$platform" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"; then
            log_warning "Failed to pull image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' (platform=$platform)"
            log_info "Attempting to use existing local image..."
            if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
                log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build or pull the image"
                exit 1

)

// Default to v1alpha2 group if empty
// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is grammatically unclear ("already by default use v1 than v1alpha2") and makes the reasoning harder to follow. Please reword it (e.g., "Default to v1 if empty, since llm-d defaults to v1 now").

Suggested change
// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
// Default to v1 if empty, since llm-d now defaults to v1 instead of v1alpha2.

Copilot uses AI. Check for mistakes.
$CONTAINER_TOOL exec -ti kind-control-plane umount -R /proc/driver/nvidia

# According to https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html
echo "> Adding/updateding the NVIDIA Helm repository"
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling in the log message: updatedingupdating (makes CLI output look unpolished).

Suggested change
echo "> Adding/updateding the NVIDIA Helm repository"
echo "> Adding/updating the NVIDIA Helm repository"

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines +841 to +842
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone error check is broken because the command is piped into grep -v. With set -o pipefail, the pipeline’s exit status can become grep’s, and a successful clone that only prints “Cloning into …” will likely make grep -v exit 1, causing the script to treat the clone as failed. Avoid piping the clone output into grep for suppression; redirect output instead (or capture output without affecting the exit code).

Suggested change
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
if ! clone_output=$(git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" 2>&1); then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
printf '%s\n' "$clone_output"

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
# TODO: update once the llm-d-inference-scheduler v0.5.0 is released
LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:v0.5.0-rc.1"}
LLM_D_INFERENCE_SCHEDULER_IMG=${LLM_D_INFERENCE_SCHEDULER_IMG:-"ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE"}
LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script now defines LLM_D_INFERENCE_SIM_IMG based on LLM_D_RELEASE, but later it still patches the llm-d values using LLM_D_INFERENCE_SIM_IMG_REPO:$LLM_D_INFERENCE_SIM_IMG_TAG (and LLM_D_INFERENCE_SIM_IMG_REPO is not defined anywhere in this script). To avoid producing an invalid image reference, either define the repo/tag variables here or standardize on a single variable (LLM_D_INFERENCE_SIM_IMG) and update the downstream patching logic accordingly.

Suggested change
LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}
LLM_D_INFERENCE_SIM_IMG=${LLM_D_INFERENCE_SIM_IMG:-"ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE"}
LLM_D_INFERENCE_SIM_IMG_REPO=${LLM_D_INFERENCE_SIM_IMG_REPO:-"ghcr.io/llm-d/llm-d-inference-sim"}
LLM_D_INFERENCE_SIM_IMG_TAG=${LLM_D_INFERENCE_SIM_IMG_TAG:-"$LLM_D_RELEASE"}

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
elif [ "$POOL_GROUP" = "inference.networking.x-k8s.io" ]; then
POOL_VERSION="v1alpha2"
else
log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_error is called here before it is defined later in the script. If an invalid POOL_GROUP is provided, this will result in log_error: command not found and a less clear failure mode. Consider moving the helper function definitions above this validation block, or replace this call with a plain echo/printf + exit 1 at this early point in the script.

Suggested change
log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"
echo -e "${RED}Error:${NC} Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" >&2

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh

# Create InferenceModel for second model (maps model name to pool)
# Note: InferenceModel CRD may not be available in all environments
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the TODO comment: InfereceModelRewrite is misspelled, which makes it harder to search for the intended resource/name later. Please correct it to InferenceModelRewrite.

Suggested change
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 9, 2026 13:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 8 comments.

Comment thread deploy/install.sh
Comment on lines +841 to +844
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
return 1
fi
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone command is piped to grep -v, which changes the pipeline exit code semantics: if clone output is fully filtered (or empty), grep can exit non-zero and make a successful clone look like a failure. Prefer checking git clone’s exit status directly (e.g., run it normally/with -q, redirecting stdout as needed while preserving the clone exit code).

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh

# Create InferenceModel for second model (maps model name to pool)
# Note: InferenceModel CRD may not be available in all environments
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: InfereceModelRewrite should be InferenceModelRewrite (and/or clarify the intended resource name).

Copilot uses AI. Check for mistakes.
Comment on lines +186 to 189
# (multi-platform manifests can reference blobs that are not in the $CONTAINER_TOOL save stream).
local platform="${KIND_IMAGE_PLATFORM:-}"
if [ -z "$platform" ]; then
case "$(uname -m)" in
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In load_image(), Podman support is only partially applied: this hunk updates the comment to mention $CONTAINER_TOOL, but the subsequent pull/inspect logic still uses docker pull / docker image inspect. With CONTAINER_TOOL=podman and WVA_IMAGE_PULL_POLICY!=IfNotPresent, the script will still require Docker. Suggest switching those commands to $CONTAINER_TOOL (or branching like the Kind load step) so the pull path works with Podman too.

Copilot uses AI. Check for mistakes.
Comment on lines +137 to +142
WVA provides a **single consolidated E2E suite** that runs on Kind clusters with emulated GPUs. Tests create VA, HPA, and model services dynamically as part of the test workflow.

> **Note**: E2E tests are only supported on Kind clusters (`ENVIRONMENT=kind-emulator`). The test fixtures and labels are configured specifically for the Kind emulator deployment which uses the `simulated-accelerators` guide from llm-d. Running E2E tests on other environments (OpenShift, generic Kubernetes) is not supported for now.

- **Location**: `test/e2e/`
- **Environments**: Kind (emulated), OpenShift, or generic Kubernetes
- **Supported Environment**: Kind (emulated GPUs only)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation change says E2E tests are only supported on Kind and that OpenShift/generic Kubernetes is not supported. That conflicts with the Makefile’s stated goal of “environment-agnostic” consolidated E2E targets (which accept ENVIRONMENT=$(ENVIRONMENT)) and the presence of an OpenShift E2E workflow in .github/workflows/ci-e2e-openshift.yaml. Please align the docs with actual supported environments (either restore OpenShift/Kubernetes support in the docs, or update the Makefile/CI if support was intentionally dropped).

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +159
// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
if poolGroup == "" {
poolGroup = PoolGroupV1Alpha2
poolGroup = PoolGroupV1
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment is hard to read and grammatically incorrect (“llm-d already by default use v1 than v1alpha2”). Please rephrase so it clearly explains why v1 is the default (e.g., “Default to v1 when poolGroup is empty”).

Copilot uses AI. Check for mistakes.
Comment on lines +179 to 181
if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
else
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message suggests running make $CONTAINER_TOOL-build ..., but the Makefile target is still named docker-build (it just uses CONTAINER_TOOL internally). Consider updating the example to something that actually works, e.g. make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=....

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +214
local tmp_tar="/tmp/wva-image-$(date +%s).tar"
$CONTAINER_TOOL save -o "$tmp_tar" "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG"
kind load image-archive "$tmp_tar" --name "$CLUSTER_NAME"
rm -f "$tmp_tar"
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Podman loading, the temp tar path is predictable and won’t be cleaned up if podman save or kind load fails. Prefer using mktemp (or similar) and a trap to ensure the archive is always removed on exit/error.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +84
// double ehck flowcontrol is enabled as env var
for _, container := range pod.Spec.Containers {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: double ehckdouble check.

Copilot uses AI. Check for mistakes.
@zdtsw
Copy link
Copy Markdown
Collaborator Author

zdtsw commented Mar 9, 2026

/test-e2e-full

@shuynh2017
Copy link
Copy Markdown
Collaborator

@zdtsw one failed test, not sure due to your changes.

@shuynh2017
Copy link
Copy Markdown
Collaborator

/ok-to-test

@shuynh2017
Copy link
Copy Markdown
Collaborator

@lionelvillard @asm582 there are few changes here but they look ok. Pls review.

@shuynh2017
Copy link
Copy Markdown
Collaborator

@zdtsw, pls check one failing test

@shuynh2017
Copy link
Copy Markdown
Collaborator

@zdtsw this may fix the smoke test #870

zdtsw added 10 commits March 11, 2026 09:05
- use env variable LLM_D_RELEASE to control all image in the
  deploy/install.sh
- clone llm-d to local based on local version if match required release
  version
- use env variable CONTAINER_TOOL to support podmano on fedora
- remove/update *ignore files

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- old test was based on v1alph2 of GIE for infpool
- new default is on v1 for infpool

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- the label we should use is "llm-d.ai/inference-serving:true"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- if wva was started before CRD regardless v1 or v1alph2 is installed,
  it hit empty

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- since sim image has a different release we cannot use same env
variable

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- if e2e is only tested on kind with specific guide gaie-sim then make
  it clear and fix the label value
- llm-d.ai/guide:simlulated-accelerators
- llm-d.ai/model:random

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
- prolong timeout in test
- add INFO to understand why did not find VA

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Copilot AI review requested due to automatic review settings March 11, 2026 08:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 6 comments.

Comment thread deploy/README.md
| `WVA_IMAGE_PULL_POLICY` | Image pull policy | `Always` |
| `LLM_D_RELEASE` | llm-d release version (controls all llm-d images) | `v0.5.1` |
| `LLM_D_INFERENCE_SCHEDULER_IMG` | Override llm-d inference scheduler image | `ghcr.io/llm-d/llm-d-inference-scheduler:$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config reference documents LLM_D_INFERENCE_SIM_IMG as the way to override the inference-sim image, but deploy/install.sh updates the primary modelservice values using LLM_D_INFERENCE_SIM_IMG_REPO + LLM_D_INFERENCE_SIM_IMG_TAG (and only uses LLM_D_INFERENCE_SIM_IMG for the optional second model). Either align the script to use LLM_D_INFERENCE_SIM_IMG consistently, or update this documentation to describe the actual override variables that take effect.

Suggested change
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG_REPO` | Override primary llm-d inference simulator image repository | `ghcr.io/llm-d/llm-d-inference-sim` |
| `LLM_D_INFERENCE_SIM_IMG_TAG` | Override primary llm-d inference simulator image tag | `$LLM_D_RELEASE` |
| `LLM_D_INFERENCE_SIM_IMG` | Override llm-d inference simulator image for the optional second model | `ghcr.io/llm-d/llm-d-inference-sim:$LLM_D_RELEASE` |

Copilot uses AI. Check for mistakes.
Comment on lines 274 to +281
if !pendingRequestExist {
// Log INFO only when queue exists but model doesn't match
if queueMetricFound {
logger.Info("Scale-from-zero: queue has pending requests but model not matched",
"va", va.Name,
"vaModelID", va.Spec.ModelID,
"queueModels", queueMetricModels)
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new INFO log inside the 100ms scale-from-zero loop can flood logs when the queue has pending requests for other models (queueMetricFound=true but no match). This effectively reintroduces high-volume logging at INFO level; consider downgrading this to DEBUG, adding rate limiting, or logging only on state transition (e.g., first time mismatch is observed).

Copilot uses AI. Check for mistakes.
Comment on lines +137 to 143
WVA provides a **single consolidated E2E suite** that runs on Kind clusters with emulated GPUs. Tests create VA, HPA, and model services dynamically as part of the test workflow.

> **Note**: E2E tests are only supported on Kind clusters (`ENVIRONMENT=kind-emulator`). The test fixtures and labels are configured specifically for the Kind emulator deployment which uses the `simulated-accelerators` guide from llm-d. Running E2E tests on other environments (OpenShift, generic Kubernetes) is not supported for now.

- **Location**: `test/e2e/`
- **Environments**: Kind (emulated), OpenShift, or generic Kubernetes
- **Supported Environment**: Kind (emulated GPUs only)
- **Tiers**: Smoke (~5–10 min) for PRs; full suite (~15–25 min) for comprehensive validation
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note says E2E tests are only supported on Kind, but this doc section still includes OpenShift-specific quick start instructions and later presents an OpenShift E2E column in the comparison matrix. Either remove/update those OpenShift references, or reword this note to clarify which suites run on which environments (e.g., Kind-only for test/e2e/ but separate OpenShift suites exist).

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines 838 to +844
if [ ! -d "$LLM_D_PROJECT" ]; then
log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
else
log_warning "$LLM_D_PROJECT directory already exists, skipping clone"
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
return 1
fi
log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone success/failure check is unreliable because the pipeline exit status comes from grep, not git clone. If git clone succeeds but only prints “Cloning into…”, grep -v will return exit code 1 (no output) and this branch will incorrectly treat the clone as failed. Capture git clone's exit code (e.g., avoid a pipe, or use PIPESTATUS) and only filter output for logging after verifying success.

Copilot uses AI. Check for mistakes.
if !isReady {
continue
}
// double ehck flowcontrol is enabled as env var
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "double ehck" → "double check".

Suggested change
// double ehck flowcontrol is enabled as env var
// double check flowcontrol is enabled as env var

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh

# Create InferenceModel for second model (maps model name to pool)
# Note: InferenceModel CRD may not be available in all environments
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in TODO comment: "InfereceModelRewrite" should be "InferenceModelRewrite".

Suggested change
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

Copilot uses AI. Check for mistakes.
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

This PR is marked as stale after 21d of inactivity. After an additional 14d of inactivity (7d to become rotten, then 7d more), it will be closed. To prevent this PR from being closed, add a comment or remove the lifecycle/stale label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants