Draft
Conversation
Made-with: Cursor
Istio 1.29+ expects the GA API group (inference.networking.k8s.io) for InferencePool resources. The previous default (v0.3.0) deployed inferencepool chart v1.0.1 with the alpha API, causing Istio to never configure ext_proc and the Gateway to return HTTP 500. Updating the default to main deploys inferencepool chart v1.4.0 which uses the GA API, fixing Gateway routing. Made-with: Cursor
The inferencepool Helm chart v1.4.0 creates InferencePool in inference.networking.k8s.io/v1 (GA), but llm-d-inference-scheduler defaults --pool-group to inference.networking.x-k8s.io (pre-GA). This mismatch causes the EPP to never discover the pool, leaving it with no endpoints — resulting in HTTP 500 from the Gateway. Patch the EPP deployment args to use --pool-group=inference.networking.k8s.io after image patching, aligning it with the InferencePool resource. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Draft PR for testing
/benchmark openshiftCI trigger.Investigation: Gateway 500 Failure
Problem: The Gateway connectivity check always fails with HTTP 500 from
istio-envoy(empty body).Root cause: The
llm-d-infrachart (v1.4.0) creates the Gateway withistio.io/enable-inference-extproc: "true", which requires Istio to natively support InferencePool-based ext_proc routing. The Istio/OSSM version on the CI OpenShift cluster doesn't appear to support this feature.What was tried:
LLM_D_RELEASEdefault fromv0.3.0tomain(aligned CRDs with GA API)--pool-groupto match the InferencePool API group (inference.networking.k8s.io)Neither fixed the 500 — the issue is at the Istio layer, not our deployment scripts.
Note: The
benchmark-openshiftjob has never succeeded in CI history (18+ consecutive failures), confirming this is a pre-existing infrastructure issue.Next step: Need to confirm the Istio/OSSM version on the CI cluster. InferencePool ext_proc support requires Istio 1.24+.
AI-assisted using Cursor IDE.