Skip to content

Conversation

@KillianGolds
Copy link

@KillianGolds KillianGolds commented Nov 28, 2025

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility
  • ✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

  • pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
  • pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
  • pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

  • pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
  • pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
  • pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

  • config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
  • config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
  • config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

  • cmd/manager/main.go - v1alpha2 API and webhook registration
  • go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

  • pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
  • test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                          ↓
              ┌──────────────────────┐
              │ Migration Trigger    │
              │ (v1 pool ready)      │
              └──────────────────────┘
                          ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
     ↓                                  ↓
  Existing                         Saved in
  Deployments                      etcd

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created

    • Both v1 and v1alpha2 InferencePools created simultaneously
    • HTTPRoute configured with both as backends
  2. Migration Trigger: v1 InferencePool becomes Ready

    • Controller detects v1 pool readiness
    • Shifts traffic: 100% → v1, 0% → v1alpha2
    • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  3. Final State:

    • Traffic permanently on v1 InferencePool
    • No rollback even if v1 pool fails (prevents flapping)
    • v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

  • All tests pass, including llmisvc controller tests (151.844s)
  • Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

  • 30 passed, 3 skipped (42m 56s)
  • Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

  1. Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

  2. HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

  3. Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

  4. Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

KillianGolds and others added 22 commits November 28, 2025 01:38
Add the new v1alpha2 API version as the storage version for LLMInferenceService.
This introduces the v1alpha2 types that will serve as the internal storage
representation while v1alpha1 remains the served API version.

Signed-off-by: Killian Golds <[email protected]>
Implement conversion webhooks between v1alpha1 (served) and v1alpha2 (storage).
This enables the API server to convert resources between versions automatically.

Signed-off-by: Killian Golds <[email protected]>
Move validation webhooks for v1alpha1 and add v1alpha2 validation webhooks
to the API package. This provides validation for both API versions.

Signed-off-by: Killian Golds <[email protected]>
Update CRDs to support both v1alpha1 (served) and v1alpha2 (storage) with
conversion webhooks. Add CA injection patches for the conversion webhook
certificates.

Signed-off-by: Killian Golds <[email protected]>
Update scheduler to create dual InferencePool objects (v1 and v1alpha2).
This ensures compatibility with both the new GIE v1 API and legacy v1alpha2.
Uses DynamicClient for v1alpha2 InferencePool operations.

Key changes:
- Creates v1 InferencePool (new GIE API) using typed client
- Creates v1alpha2 InferencePool (legacy) using dynamic client
- Manages v1alpha2 InferenceModel for scheduler routing

Signed-off-by: Killian Golds <[email protected]>
Update router to work with v1alpha2 API and reference the correct InferencePool.
Includes updates to discovery, validation, and gateway condition handling.

Signed-off-by: Killian Golds <[email protected]>
Update workload management and lifecycle handling for v1alpha2 API.
Includes single-node, multi-node, storage, TLS, and OCP SCC handling.

Signed-off-by: Killian Golds <[email protected]>
Update controller setup with DynamicClient for v1 InferencePool operations.

Key changes:
- Register v1alpha2 API watches
- Add DynamicClient for v1 InferencePool operations
- Update scheme registration for v1alpha2 types

Signed-off-by: Killian Golds <[email protected]>
Add utility for safe Kubernetes resource naming to handle cases where
service names might exceed K8s naming limits or contain invalid characters.

Signed-off-by: Killian Golds <[email protected]>
Update monitoring resources and sample generation for v1alpha2 API types.

Signed-off-by: Killian Golds <[email protected]>
Update integration test fixtures and builders to support v1alpha2 API types.

Signed-off-by: Killian Golds <[email protected]>
Update main.go to register v1alpha2 scheme and webhooks for multi-version
CRD support.

Signed-off-by: Killian Golds <[email protected]>
Regenerate Go and Python client code for multi-version API support.

Signed-off-by: Killian Golds <[email protected]>
Add e2e test infrastructure that parametrizes tests across both API versions.

Key features:
- @pytest.fixture(params=["v1alpha1", "v1alpha2"]) for API version parametrization
- 28 test items (14 v1alpha1 + 14 v1alpha2)
- Tests conversion webhook path (v1alpha1 submission -> v1alpha2 storage)

Signed-off-by: Killian Golds <[email protected]>
Update Go module dependencies for Gateway API v1.3.0 and controller-runtime v0.22.

Signed-off-by: Killian Golds <[email protected]>
Fix compatibility issues with K8s 0.34 API changes in v1beta1 types.

Signed-off-by: Killian Golds <[email protected]>
Fix linter warnings from controller-runtime upgrade in v1beta1 controller.

Signed-off-by: Killian Golds <[email protected]>
Update auto-generated CRDs, RBAC rules, OpenAPI specs, deepcopy functions,
Python SDK docs, and violation exceptions list.

Signed-off-by: Killian Golds <[email protected]>
KEDA v2.18.0 (required for controller-runtime v0.22+) requires Go 1.24.7.
The UBI go-toolset:1.24 image only provides Go 1.24.6, so switch to the
official golang:1.24.7 image for the builder stage.

Signed-off-by: Killian Golds <[email protected]>
- Update integration tests to use v1alpha2 API and GIE v1
- Add API version parameterization to e2e stop tests for v1alpha1/v1alpha2
- Use unstructured for InferenceModel (v1alpha2) in integration tests

Signed-off-by: Killian Golds <[email protected]>
Brings in:
- odh 3.1 release tag bump (opendatahub-io#988)
- Do not register ClusterServingRuntime (opendatahub-io#990)

Stop feature conflicts resolved by keeping our GIE v1 integrated version.

Signed-off-by: Killian Golds <[email protected]>
@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 28, 2025

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility
  • ✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

  • pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
  • pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
  • pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

  • pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
  • pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
  • pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

  • config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
  • config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
  • config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

  • cmd/manager/main.go - v1alpha2 API and webhook registration
  • go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

  • pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
  • test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created
  • Both v1 and v1alpha2 InferencePools created simultaneously
  • HTTPRoute configured with both as backends
  1. Migration Trigger: v1 InferencePool becomes Ready
  • Controller detects v1 pool readiness
  • Shifts traffic: 100% → v1, 0% → v1alpha2
  • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  1. Final State:
  • Traffic permanently on v1 InferencePool
  • No rollback even if v1 pool fails (prevents flapping)
  • v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

  • All tests pass, including llmisvc controller tests (151.844s)
  • Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

  • 30 passed, 3 skipped (42m 56s)
  • Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

  1. Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

  2. HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

  3. Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

  4. Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: KillianGolds
Once this PR has been reviewed and has the lgtm label, please assign brettmthompson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 28, 2025

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility
  • ✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

  • pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
  • pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
  • pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

  • pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
  • pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
  • pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

  • config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
  • config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
  • config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

  • cmd/manager/main.go - v1alpha2 API and webhook registration
  • go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

  • pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
  • test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created
  • Both v1 and v1alpha2 InferencePools created simultaneously
  • HTTPRoute configured with both as backends
  1. Migration Trigger: v1 InferencePool becomes Ready
  • Controller detects v1 pool readiness
  • Shifts traffic: 100% → v1, 0% → v1alpha2
  • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  1. Final State:
  • Traffic permanently on v1 InferencePool
  • No rollback even if v1 pool fails (prevents flapping)
  • v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

  • All tests pass, including llmisvc controller tests (151.844s)
  • Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

  • 30 passed, 3 skipped (42m 56s)
  • Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

  1. Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

  2. HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

  3. Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

  4. Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 28, 2025

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility
  • ✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

  • pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
  • pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
  • pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

  • pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
  • pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
  • pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

  • config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
  • config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
  • config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

  • cmd/manager/main.go - v1alpha2 API and webhook registration
  • go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

  • pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
  • test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created
  • Both v1 and v1alpha2 InferencePools created simultaneously
  • HTTPRoute configured with both as backends
  1. Migration Trigger: v1 InferencePool becomes Ready
  • Controller detects v1 pool readiness
  • Shifts traffic: 100% → v1, 0% → v1alpha2
  • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  1. Final State:
  • Traffic permanently on v1 InferencePool
  • No rollback even if v1 pool fails (prevents flapping)
  • v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

  • All tests pass, including llmisvc controller tests (151.844s)
  • Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

  • 30 passed, 3 skipped (42m 56s)
  • Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

  1. Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

  2. HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

  3. Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

  4. Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 28, 2025

@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue.

In response to this:

Overview

This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.

Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:

Major Dependency Upgrades (merged from upstream during development):

  • Kubernetes: v0.33.1 → v0.34.1
  • Gateway API: v1.2.1 → v1.4.0
  • Gateway API Inference Extension: v0.3.0 → v1.0.0
  • KEDA: v2.16.1 → v2.18.0
  • controller-runtime: v0.19.7 → v0.22.3

Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:

  • ✅ Dual InferencePool creation (GIE v1 + v1alpha2 simultaneously)
  • ✅ v1alpha2 API versioning with conversion webhooks (v1alpha2 storage ↔ v1alpha1 served)
  • ✅ One-way traffic migration from v1alpha2 → v1 InferencePools
  • ✅ Zero-downtime migration with backward compatibility
  • ✅ Stop feature support with dual API version testing

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # RHOAIENG-34472

Key Files for Review

API Layer

  • pkg/apis/serving/v1alpha2/ - New v1alpha2 API types (storage version)
  • pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go - v1alpha1↔v1alpha2 conversion
  • pkg/apis/serving/v1alpha2/llm_inference_service_defaults.go - v1alpha2 defaulting logic

Controller Layer

  • pkg/controller/llmisvc/scheduler.go - Dual InferencePool creation (GIE v1 + v1alpha2)
  • pkg/controller/llmisvc/router.go - Updated for v1alpha2 types and dual pool status checks
  • pkg/controller/llmisvc/controller.go - Main reconciliation loop updates

Configuration

  • config/crd/full/serving.kserve.io_llminferenceservices.yaml - Multi-version CRD
  • config/crd/full/llmisvc_conversion_patch.yaml - Conversion webhook config
  • config/default/llmisvc_cainjection_conversion_webhook.yaml - CA injection for conversion

Registration & Dependencies

  • cmd/manager/main.go - v1alpha2 API and webhook registration
  • go.mod - GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)

Tests

  • pkg/controller/llmisvc/controller_int_stop_test.go - Integration tests for stop feature with v1alpha2
  • test/e2e/llmisvc/ - E2E tests updated for dual API version support

🏗️ Architecture

Dual-Pool Strategy

┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (/v1/completions, /v1/chat/completions)          │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs (Initially)                             │   │
│  │  • v1alpha2 InferencePool: weight=100 (100% traffic) │   │
│  │  • v1 InferencePool: weight=0 (0% traffic)           │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                         ↓
             ┌──────────────────────┐
             │ Migration Trigger    │
             │ (v1 pool ready)      │
             └──────────────────────┘
                         ↓
┌─────────────────────────────────────────────────────────────┐
│  HTTPRoute (After Migration)                                │
│  ┌──────────────────────────────────────────────────────┐   │
│  │ Backend Refs                                         │   │
│  │  • v1 InferencePool: weight=100 (100% traffic)       │   │
│  │  • Annotation: inference-pool-migrated=v1            │   │
│  │  • One-way migration (no rollback)                   │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

API Version Flow

┌────────────┐    Conversion     ┌────────────┐
│  v1alpha1  │ ←──  Webhook  ──→ │  v1alpha2  │
│  (Served)  │                   │  (Storage) │
└────────────┘                   └────────────┘
    ↓                                  ↓
 Existing                         Saved in
 Deployments                      etcd

Migration Behavior

One-Way Migration Logic

  1. Initial State: LLMInferenceService created
  • Both v1 and v1alpha2 InferencePools created simultaneously
  • HTTPRoute configured with both as backends
  1. Migration Trigger: v1 InferencePool becomes Ready
  • Controller detects v1 pool readiness
  • Shifts traffic: 100% → v1, 0% → v1alpha2
  • Sets annotation: serving.kserve.io/inference-pool-migrated: v1
  1. Final State:
  • Traffic permanently on v1 InferencePool
  • No rollback even if v1 pool fails (prevents flapping)
  • v1alpha2 pool remains (for future cleanup/deprecation)

Feature/Issue validation/testing

Unit & Integration Tests (make test)

  • All tests pass, including llmisvc controller tests (151.844s)
  • Validates multi-node deployments, storage configuration, router lifecycle, stop/resume, RBAC propagation, and resource cleanup

End-to-End Tests (test/e2e/llmisvc/)

  • 30 passed, 3 skipped (42m 56s)
  • Skipped tests require auth-specific cluster configuration

Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required containers field in PodSpec fixtures).

Notes for reviewers

  1. Integration test fixtures updated - Tests now include required containers field in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.

  2. HTTPRoute deletion test expectation corrected - The test previously expected HTTPRoutesReady=True when Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.

  3. Dual InferencePool support in router - The router now checks both inference.networking.k8s.io (v1) and inference.networking.x-k8s.io (v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.

  4. Route config uses weighted backends - config-llm-router-route.yaml configures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

- Add processConfigTemplates() to pre-process Go templates in config
  files before loading into envtest (fixes CRD regex validation)
- Update multi-node tests to use SimpleWorkerPodSpec() instead of
  empty PodSpec (required containers field)
- Add required containers to storage test PodSpec fixtures
- Fix HTTPRoute deletion test to expect cleared condition (not True)
- Fix stop test duplicate IstioShadowService creation

Signed-off-by: Killian Golds <[email protected]>
The GIE v1.1.0 migration requires Go 1.24.7 due to dependency chain:
- controller-runtime v0.22+ -> KEDA v2.18.0 -> Go 1.24.7

The UBI go-toolset:1.24 image only has Go 1.24.6, so switch to the
official golang:1.24.7 image for the builder stage.

Signed-off-by: Killian Golds <[email protected]>
@KillianGolds KillianGolds force-pushed the RHOAIENG-34472-GIEv1-clean branch from 7f96f85 to 4fdc1d4 Compare November 28, 2025 13:07
Delete 16 auto-generated test files with invalid Python syntax
due to OpenAPI generator bug. Add them to .openapi-generator-ignore
to prevent regeneration on make precommit.

Signed-off-by: Killian Golds <[email protected]>
@KillianGolds
Copy link
Author

/test e2e-llm-inference-service

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

@KillianGolds: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-llm-inference-service 2c2193f link true /test e2e-llm-inference-service

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: New/Backlog

Development

Successfully merging this pull request may close these issues.

3 participants