Skip to content

chore: benchmark test run (draft)#1017

Draft
kahilam wants to merge 3 commits intomainfrom
benchmark/test-run-20260414
Draft

chore: benchmark test run (draft)#1017
kahilam wants to merge 3 commits intomainfrom
benchmark/test-run-20260414

Conversation

@kahilam
Copy link
Copy Markdown
Collaborator

@kahilam kahilam commented Apr 15, 2026

Purpose

Draft PR for testing /benchmark openshift CI trigger.

Investigation: Gateway 500 Failure

Problem: The Gateway connectivity check always fails with HTTP 500 from istio-envoy (empty body).

Root cause: The llm-d-infra chart (v1.4.0) creates the Gateway with istio.io/enable-inference-extproc: "true", which requires Istio to natively support InferencePool-based ext_proc routing. The Istio/OSSM version on the CI OpenShift cluster doesn't appear to support this feature.

What was tried:

  1. Changed LLM_D_RELEASE default from v0.3.0 to main (aligned CRDs with GA API)
  2. Patched EPP --pool-group to match the InferencePool API group (inference.networking.k8s.io)

Neither fixed the 500 — the issue is at the Istio layer, not our deployment scripts.

Note: The benchmark-openshift job has never succeeded in CI history (18+ consecutive failures), confirming this is a pre-existing infrastructure issue.

Next step: Need to confirm the Istio/OSSM version on the CI cluster. InferencePool ext_proc support requires Istio 1.24+.

AI-assisted using Cursor IDE.

kahilam added 3 commits April 15, 2026 10:33
Istio 1.29+ expects the GA API group (inference.networking.k8s.io)
for InferencePool resources. The previous default (v0.3.0) deployed
inferencepool chart v1.0.1 with the alpha API, causing Istio to
never configure ext_proc and the Gateway to return HTTP 500.

Updating the default to main deploys inferencepool chart v1.4.0
which uses the GA API, fixing Gateway routing.

Made-with: Cursor
The inferencepool Helm chart v1.4.0 creates InferencePool in
inference.networking.k8s.io/v1 (GA), but llm-d-inference-scheduler
defaults --pool-group to inference.networking.x-k8s.io (pre-GA).
This mismatch causes the EPP to never discover the pool, leaving
it with no endpoints — resulting in HTTP 500 from the Gateway.

Patch the EPP deployment args to use --pool-group=inference.networking.k8s.io
after image patching, aligning it with the InferencePool resource.

Made-with: Cursor
@kahilam kahilam requested a review from asm582 April 15, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant