RHOAIENG-34472: GIE v1 Migration #996

KillianGolds · 2025-11-28T12:41:24Z

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration
GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1
Gateway API: v1.2.1 → v1.4.0
Gateway API Inference Extension: v0.3.0 → v1.0.0
KEDA: v2.16.1 → v2.18.0
controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
✅ One-way traffic migration from v1alpha2 → v1 InferencePools
✅ Zero-downtime migration with backward compatibility
✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

cmd/manager/main.go - v1alpha2 API and webhook registration
go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                          ↓
              ┌──────────────────────┐
              │ Migration Trigger    │
              │ (v1 pool ready)      │
              └──────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
     ↓                                  ↓
  Existing                         Saved in
  Deployments                      etcd

Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created
- Both v1 and v1alpha2 InferencePools created simultaneously
- HTTPRoute configured with both as backends
Migration Trigger: v1 InferencePool becomes Ready
- Controller detects v1 pool readiness
- Shifts traffic: 100% → v1, 0% → v1alpha2
- Sets annotation: serving.kserve.io/inference-pool-migrated: v1
Final State:
- Traffic permanently on v1 InferencePool
- No rollback even if v1 pool fails (prevents flapping)
- v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

All tests pass, including llmisvc controller tests (151.844s)
Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

30 passed, 3 skipped (42m 56s)
Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.
HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.
Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.
Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?
Have you linked the JIRA issue(s) to this PR?

Release note:

Re-running failed tests

/rerun-all - rerun all failed workflows.
/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Add the new v1alpha2 API version as the storage version for LLMInferenceService. This introduces the v1alpha2 types that will serve as the internal storage representation while v1alpha1 remains the served API version. Signed-off-by: Killian Golds <[email protected]>

Implement conversion webhooks between v1alpha1 (served) and v1alpha2 (storage). This enables the API server to convert resources between versions automatically. Signed-off-by: Killian Golds <[email protected]>

Move validation webhooks for v1alpha1 and add v1alpha2 validation webhooks to the API package. This provides validation for both API versions. Signed-off-by: Killian Golds <[email protected]>

Update CRDs to support both v1alpha1 (served) and v1alpha2 (storage) with conversion webhooks. Add CA injection patches for the conversion webhook certificates. Signed-off-by: Killian Golds <[email protected]>

Update scheduler to create dual InferencePool objects (v1 and v1alpha2). This ensures compatibility with both the new GIE v1 API and legacy v1alpha2. Uses DynamicClient for v1alpha2 InferencePool operations. Key changes: - Creates v1 InferencePool (new GIE API) using typed client - Creates v1alpha2 InferencePool (legacy) using dynamic client - Manages v1alpha2 InferenceModel for scheduler routing Signed-off-by: Killian Golds <[email protected]>

Update router to work with v1alpha2 API and reference the correct InferencePool. Includes updates to discovery, validation, and gateway condition handling. Signed-off-by: Killian Golds <[email protected]>

Update workload management and lifecycle handling for v1alpha2 API. Includes single-node, multi-node, storage, TLS, and OCP SCC handling. Signed-off-by: Killian Golds <[email protected]>

Update controller setup with DynamicClient for v1 InferencePool operations. Key changes: - Register v1alpha2 API watches - Add DynamicClient for v1 InferencePool operations - Update scheme registration for v1alpha2 types Signed-off-by: Killian Golds <[email protected]>

Add utility for safe Kubernetes resource naming to handle cases where service names might exceed K8s naming limits or contain invalid characters. Signed-off-by: Killian Golds <[email protected]>

Update monitoring resources and sample generation for v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>

Update integration test fixtures and builders to support v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>

Update main.go to register v1alpha2 scheme and webhooks for multi-version CRD support. Signed-off-by: Killian Golds <[email protected]>

Regenerate Go and Python client code for multi-version API support. Signed-off-by: Killian Golds <[email protected]>

@pytest

Add e2e test infrastructure that parametrizes tests across both API versions. Key features: - @pytest.fixture(params=["v1alpha1", "v1alpha2"]) for API version parametrization - 28 test items (14 v1alpha1 + 14 v1alpha2) - Tests conversion webhook path (v1alpha1 submission -> v1alpha2 storage) Signed-off-by: Killian Golds <[email protected]>

Update Go module dependencies for Gateway API v1.3.0 and controller-runtime v0.22. Signed-off-by: Killian Golds <[email protected]>

Fix compatibility issues with K8s 0.34 API changes in v1beta1 types. Signed-off-by: Killian Golds <[email protected]>

Fix linter warnings from controller-runtime upgrade in v1beta1 controller. Signed-off-by: Killian Golds <[email protected]>

Update auto-generated CRDs, RBAC rules, OpenAPI specs, deepcopy functions, Python SDK docs, and violation exceptions list. Signed-off-by: Killian Golds <[email protected]>

KEDA v2.18.0 (required for controller-runtime v0.22+) requires Go 1.24.7. The UBI go-toolset:1.24 image only provides Go 1.24.6, so switch to the official golang:1.24.7 image for the builder stage. Signed-off-by: Killian Golds <[email protected]>

Signed-off-by: Pierangelo Di Pilato <[email protected]>

- Update integration tests to use v1alpha2 API and GIE v1 - Add API version parameterization to e2e stop tests for v1alpha1/v1alpha2 - Use unstructured for InferenceModel (v1alpha2) in integration tests Signed-off-by: Killian Golds <[email protected]>

Brings in: - odh 3.1 release tag bump (opendatahub-io#988) - Do not register ClusterServingRuntime (opendatahub-io#990) Stop feature conflicts resolved by keeping our GIE v1 integrated version. Signed-off-by: Killian Golds <[email protected]>

openshift-ci · 2025-11-28T12:41:29Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2025-11-28T12:41:29Z

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration

GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1

Gateway API: v1.2.1 → v1.4.0

Gateway API Inference Extension: v0.3.0 → v1.0.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)

✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)

✅ One-way traffic migration from v1alpha2 → v1 InferencePools

✅ Zero-downtime migration with backward compatibility

✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion

pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)

pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks

pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD

config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config

config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

cmd/manager/main.go - v1alpha2 API and webhook registration

go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2

test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
API Version Flow
┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd
Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created

Both v1 and v1alpha2 InferencePools created simultaneously

HTTPRoute configured with both as backends

Migration Trigger: v1 InferencePool becomes Ready

Controller detects v1 pool readiness

Shifts traffic: 100% → v1, 0% → v1alpha2

Sets annotation: serving.kserve.io/inference-pool-migrated: v1

Final State:

Traffic permanently on v1 InferencePool

No rollback even if v1 pool fails (prevents flapping)

v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

All tests pass, including llmisvc controller tests (151.844s)

Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

30 passed, 3 skipped (42m 56s)

Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?

Has code been commented, particularly in hard-to-understand areas?

Have you made corresponding changes to the documentation?

Have you linked the JIRA issue(s) to this PR?

Release note:
Re-running failed tests

/rerun-all - rerun all failed workflows.

/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2025-11-28T12:41:32Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-11-28T12:41:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KillianGolds
Once this PR has been reviewed and has the lgtm label, please assign brettmthompson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-11-28T12:42:05Z

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration

GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1

Gateway API: v1.2.1 → v1.4.0

Gateway API Inference Extension: v0.3.0 → v1.0.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)

✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)

✅ One-way traffic migration from v1alpha2 → v1 InferencePools

✅ Zero-downtime migration with backward compatibility

✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion

pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)

pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks

pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD

config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config

config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

cmd/manager/main.go - v1alpha2 API and webhook registration

go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2

test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
API Version Flow
┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd
Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created

Both v1 and v1alpha2 InferencePools created simultaneously

HTTPRoute configured with both as backends

Migration Trigger: v1 InferencePool becomes Ready

Controller detects v1 pool readiness

Shifts traffic: 100% → v1, 0% → v1alpha2

Sets annotation: serving.kserve.io/inference-pool-migrated: v1

Final State:

Traffic permanently on v1 InferencePool

No rollback even if v1 pool fails (prevents flapping)

v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

All tests pass, including llmisvc controller tests (151.844s)

Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

30 passed, 3 skipped (42m 56s)

Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?

Has code been commented, particularly in hard-to-understand areas?

Have you made corresponding changes to the documentation?

Have you linked the JIRA issue(s) to this PR?

Release note:
Re-running failed tests

/rerun-all - rerun all failed workflows.

/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-28T12:42:09Z

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration

GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1

Gateway API: v1.2.1 → v1.4.0

Gateway API Inference Extension: v0.3.0 → v1.0.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)

✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)

✅ One-way traffic migration from v1alpha2 → v1 InferencePools

✅ Zero-downtime migration with backward compatibility

✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion

pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)

pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks

pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD

config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config

config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

cmd/manager/main.go - v1alpha2 API and webhook registration

go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2

test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
API Version Flow
┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd
Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created

Both v1 and v1alpha2 InferencePools created simultaneously

HTTPRoute configured with both as backends

Migration Trigger: v1 InferencePool becomes Ready

Controller detects v1 pool readiness

Shifts traffic: 100% → v1, 0% → v1alpha2

Sets annotation: serving.kserve.io/inference-pool-migrated: v1

Final State:

Traffic permanently on v1 InferencePool

No rollback even if v1 pool fails (prevents flapping)

v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

All tests pass, including llmisvc controller tests (151.844s)

Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

30 passed, 3 skipped (42m 56s)

Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?

Has code been commented, particularly in hard-to-understand areas?

Have you made corresponding changes to the documentation?

Have you linked the JIRA issue(s) to this PR?

Release note:
Re-running failed tests

/rerun-all - rerun all failed workflows.

/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-28T12:42:13Z

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Gateway API bug: kubernetes-sigs/gateway-api#4178 - Fixed issues blocking GIE v1 migration

GIE validation bug: kubernetes-sigs/gateway-api-inference-extension#1679 - Fixed InferencePool validation issues

Major Dependency Upgrades (merged from upstream during development):

Kubernetes: v0.33.1 → v0.34.1

Gateway API: v1.2.1 → v1.4.0

Gateway API Inference Extension: v0.3.0 → v1.0.0

KEDA: v2.16.1 → v2.18.0

controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)

✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)

✅ One-way traffic migration from v1alpha2 → v1 InferencePools

✅ Zero-downtime migration with backward compatibility

✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)

pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion

pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)

pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks

pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD

config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config

config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

cmd/manager/main.go - v1alpha2 API and webhook registration

go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2

test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
API Version Flow
┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd
Migration Behavior

One-Way Migration Logic

Initial State: LLMInferenceService created

Both v1 and v1alpha2 InferencePools created simultaneously

HTTPRoute configured with both as backends

Migration Trigger: v1 InferencePool becomes Ready

Controller detects v1 pool readiness

Shifts traffic: 100% → v1, 0% → v1alpha2

Sets annotation: serving.kserve.io/inference-pool-migrated: v1

Final State:

Traffic permanently on v1 InferencePool

No rollback even if v1 pool fails (prevents flapping)

v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

All tests pass, including llmisvc controller tests (151.844s)

Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

30 passed, 3 skipped (42m 56s)

Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

Have you added unit/e2e tests that prove your fix is effective or that this feature works?

Has code been commented, particularly in hard-to-understand areas?

Have you made corresponding changes to the documentation?

Have you linked the JIRA issue(s) to this PR?

Release note:
Re-running failed tests

/rerun-all - rerun all failed workflows.

/rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

- Add processConfigTemplates() to pre-process Go templates in config files before loading into envtest (fixes CRD regex validation) - Update multi-node tests to use SimpleWorkerPodSpec() instead of empty PodSpec (required containers field) - Add required containers to storage test PodSpec fixtures - Fix HTTPRoute deletion test to expect cleared condition (not True) - Fix stop test duplicate IstioShadowService creation Signed-off-by: Killian Golds <[email protected]>

Signed-off-by: Killian Golds <[email protected]>

The GIE v1.1.0 migration requires Go 1.24.7 due to dependency chain: - controller-runtime v0.22+ -> KEDA v2.18.0 -> Go 1.24.7 The UBI go-toolset:1.24 image only has Go 1.24.6, so switch to the official golang:1.24.7 image for the builder stage. Signed-off-by: Killian Golds <[email protected]>

Delete 16 auto-generated test files with invalid Python syntax due to OpenAPI generator bug. Add them to .openapi-generator-ignore to prevent regeneration on make precommit. Signed-off-by: Killian Golds <[email protected]>

KillianGolds · 2025-11-28T14:22:40Z

/test e2e-llm-inference-service

openshift-ci · 2025-11-28T15:11:20Z

@KillianGolds: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-llm-inference-service	`2c2193f`	link	true	`/test e2e-llm-inference-service`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

KillianGolds and others added 22 commits November 28, 2025 01:38

feat(api): add v1alpha1 to v1alpha2 conversion logic

206f65f

Implement conversion webhooks between v1alpha1 (served) and v1alpha2 (storage). This enables the API server to convert resources between versions automatically. Signed-off-by: Killian Golds <[email protected]>

feat(webhooks): add v1alpha2 validation webhooks

f39ffee

Move validation webhooks for v1alpha1 and add v1alpha2 validation webhooks to the API package. This provides validation for both API versions. Signed-off-by: Killian Golds <[email protected]>

feat(crds): add multi-version CRD support with conversion webhooks

b23714c

Update CRDs to support both v1alpha1 (served) and v1alpha2 (storage) with conversion webhooks. Add CA injection patches for the conversion webhook certificates. Signed-off-by: Killian Golds <[email protected]>

feat(controller): update router for v1alpha2 and dual InferencePool

09e923e

Update router to work with v1alpha2 API and reference the correct InferencePool. Includes updates to discovery, validation, and gateway condition handling. Signed-off-by: Killian Golds <[email protected]>

feat(controller): update workload and lifecycle for v1alpha2

fb535c6

Update workload management and lifecycle handling for v1alpha2 API. Includes single-node, multi-node, storage, TLS, and OCP SCC handling. Signed-off-by: Killian Golds <[email protected]>

feat(controller): add SafeChildName utility for K8s naming constraints

f28ce72

Add utility for safe Kubernetes resource naming to handle cases where service names might exceed K8s naming limits or contain invalid characters. Signed-off-by: Killian Golds <[email protected]>

feat(controller): update monitoring and samples for v1alpha2

70a1ad0

Update monitoring resources and sample generation for v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>

feat(controller): update test fixtures for v1alpha2

f5cafe3

Update integration test fixtures and builders to support v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>

feat(main): register v1alpha2 API and dual validation webhooks

b336fad

Update main.go to register v1alpha2 scheme and webhooks for multi-version CRD support. Signed-off-by: Killian Golds <[email protected]>

chore: regenerate client code for v1alpha1 and v1alpha2 APIs

cf5cbc7

Regenerate Go and Python client code for multi-version API support. Signed-off-by: Killian Golds <[email protected]>

chore: update go dependencies for GIE v1 and K8s 0.34

c3f4c69

Update Go module dependencies for Gateway API v1.3.0 and controller-runtime v0.22. Signed-off-by: Killian Golds <[email protected]>

fix(v1beta1): update PodSpec handling for K8s 0.34 compatibility

33cb777

Fix compatibility issues with K8s 0.34 API changes in v1beta1 types. Signed-off-by: Killian Golds <[email protected]>

fix(v1beta1): address linter warnings for controller-runtime v0.22+

3765a3d

Fix linter warnings from controller-runtime upgrade in v1beta1 controller. Signed-off-by: Killian Golds <[email protected]>

chore: update auto-generated files from make precommit

e35bab2

Update auto-generated CRDs, RBAC rules, OpenAPI specs, deepcopy functions, Python SDK docs, and violation exceptions list. Signed-off-by: Killian Golds <[email protected]>

Allow stopping LLMInferenceService (opendatahub-io#974)

ad36904

Signed-off-by: Pierangelo Di Pilato <[email protected]>

Merge upstream/release-v0.15

6d7abd6

Brings in: - odh 3.1 release tag bump (opendatahub-io#988) - Do not register ClusterServingRuntime (opendatahub-io#990) Stop feature conflicts resolved by keeping our GIE v1 integrated version. Signed-off-by: Killian Golds <[email protected]>

github-project-automation bot added this to ODH Model Serving Planning Nov 28, 2025

github-project-automation bot moved this to New/Backlog in ODH Model Serving Planning Nov 28, 2025

openshift-ci-robot added the jira/valid-reference label Nov 28, 2025

openshift-ci bot added the do-not-merge/work-in-progress label Nov 28, 2025

KillianGolds added 2 commits November 28, 2025 12:51

chore: update auto-generated files from make precommit

24d832c

Signed-off-by: Killian Golds <[email protected]>

KillianGolds requested a review from pierDipi November 28, 2025 13:01

KillianGolds force-pushed the RHOAIENG-34472-GIEv1-clean branch from 7f96f85 to 4fdc1d4 Compare November 28, 2025 13:07

fix(python): remove broken auto-generated test stubs

2c2193f

Delete 16 auto-generated test files with invalid Python syntax due to OpenAPI generator bug. Add them to .openapi-generator-ignore to prevent regeneration on make precommit. Signed-off-by: Killian Golds <[email protected]>

KillianGolds mentioned this pull request Nov 28, 2025

RHOAIENG-34472: [WIP] GIE v1 Migration #948

Closed

4 tasks

RHOAIENG-34472: GIE v1 Migration #996

Are you sure you want to change the base?

RHOAIENG-34472: GIE v1 Migration #996

Uh oh!

Conversation

KillianGolds commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Files for Review

API Layer

Controller Layer

Configuration

Registration & Dependencies

Tests

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

Migration Behavior

One-Way Migration Logic

Feature/Issue validation/testing

Notes for reviewers

Uh oh!

openshift-ci bot commented Nov 28, 2025

Uh oh!

openshift-ci-robot commented Nov 28, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Files for Review

API Layer

Controller Layer

Configuration

Registration & Dependencies

Tests

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

Migration Behavior

One-Way Migration Logic

Feature/Issue validation/testing

Notes for reviewers

Uh oh!

coderabbitai bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci bot commented Nov 28, 2025

Uh oh!

openshift-ci-robot commented Nov 28, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Files for Review

API Layer

Controller Layer

Configuration

Registration & Dependencies

Tests

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

Migration Behavior

One-Way Migration Logic

Feature/Issue validation/testing

Notes for reviewers

Uh oh!

openshift-ci-robot commented Nov 28, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Files for Review

API Layer

Controller Layer

Configuration

Registration & Dependencies

Tests

🏗️ Architecture

Dual-Pool Strategy

API Version Flow

Migration Behavior

One-Way Migration Logic

Feature/Issue validation/testing

Notes for reviewers

KillianGolds commented Nov 28, 2025 •

edited

Loading

openshift-ci-robot commented Nov 28, 2025 •

edited by openshift-ci bot

Loading

coderabbitai bot commented Nov 28, 2025 •

edited

Loading

openshift-ci-robot commented Nov 28, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 28, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 28, 2025 •

edited by openshift-ci bot

Loading