-
Notifications
You must be signed in to change notification settings - Fork 41
RHOAIENG-34472: GIE v1 Migration #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-v0.15
Are you sure you want to change the base?
RHOAIENG-34472: GIE v1 Migration #996
Conversation
Add the new v1alpha2 API version as the storage version for LLMInferenceService. This introduces the v1alpha2 types that will serve as the internal storage representation while v1alpha1 remains the served API version. Signed-off-by: Killian Golds <[email protected]>
Implement conversion webhooks between v1alpha1 (served) and v1alpha2 (storage). This enables the API server to convert resources between versions automatically. Signed-off-by: Killian Golds <[email protected]>
Move validation webhooks for v1alpha1 and add v1alpha2 validation webhooks to the API package. This provides validation for both API versions. Signed-off-by: Killian Golds <[email protected]>
Update CRDs to support both v1alpha1 (served) and v1alpha2 (storage) with conversion webhooks. Add CA injection patches for the conversion webhook certificates. Signed-off-by: Killian Golds <[email protected]>
Update scheduler to create dual InferencePool objects (v1 and v1alpha2). This ensures compatibility with both the new GIE v1 API and legacy v1alpha2. Uses DynamicClient for v1alpha2 InferencePool operations. Key changes: - Creates v1 InferencePool (new GIE API) using typed client - Creates v1alpha2 InferencePool (legacy) using dynamic client - Manages v1alpha2 InferenceModel for scheduler routing Signed-off-by: Killian Golds <[email protected]>
Update router to work with v1alpha2 API and reference the correct InferencePool. Includes updates to discovery, validation, and gateway condition handling. Signed-off-by: Killian Golds <[email protected]>
Update workload management and lifecycle handling for v1alpha2 API. Includes single-node, multi-node, storage, TLS, and OCP SCC handling. Signed-off-by: Killian Golds <[email protected]>
Update controller setup with DynamicClient for v1 InferencePool operations. Key changes: - Register v1alpha2 API watches - Add DynamicClient for v1 InferencePool operations - Update scheme registration for v1alpha2 types Signed-off-by: Killian Golds <[email protected]>
Add utility for safe Kubernetes resource naming to handle cases where service names might exceed K8s naming limits or contain invalid characters. Signed-off-by: Killian Golds <[email protected]>
Update monitoring resources and sample generation for v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>
Update integration test fixtures and builders to support v1alpha2 API types. Signed-off-by: Killian Golds <[email protected]>
Update main.go to register v1alpha2 scheme and webhooks for multi-version CRD support. Signed-off-by: Killian Golds <[email protected]>
Regenerate Go and Python client code for multi-version API support. Signed-off-by: Killian Golds <[email protected]>
Add e2e test infrastructure that parametrizes tests across both API versions. Key features: - @pytest.fixture(params=["v1alpha1", "v1alpha2"]) for API version parametrization - 28 test items (14 v1alpha1 + 14 v1alpha2) - Tests conversion webhook path (v1alpha1 submission -> v1alpha2 storage) Signed-off-by: Killian Golds <[email protected]>
Update Go module dependencies for Gateway API v1.3.0 and controller-runtime v0.22. Signed-off-by: Killian Golds <[email protected]>
Fix compatibility issues with K8s 0.34 API changes in v1beta1 types. Signed-off-by: Killian Golds <[email protected]>
Fix linter warnings from controller-runtime upgrade in v1beta1 controller. Signed-off-by: Killian Golds <[email protected]>
Update auto-generated CRDs, RBAC rules, OpenAPI specs, deepcopy functions, Python SDK docs, and violation exceptions list. Signed-off-by: Killian Golds <[email protected]>
KEDA v2.18.0 (required for controller-runtime v0.22+) requires Go 1.24.7. The UBI go-toolset:1.24 image only provides Go 1.24.6, so switch to the official golang:1.24.7 image for the builder stage. Signed-off-by: Killian Golds <[email protected]>
Signed-off-by: Pierangelo Di Pilato <[email protected]>
- Update integration tests to use v1alpha2 API and GIE v1 - Add API version parameterization to e2e stop tests for v1alpha1/v1alpha2 - Use unstructured for InferenceModel (v1alpha2) in integration tests Signed-off-by: Killian Golds <[email protected]>
Brings in: - odh 3.1 release tag bump (opendatahub-io#988) - Do not register ClusterServingRuntime (opendatahub-io#990) Stop feature conflicts resolved by keeping our GIE v1 integrated version. Signed-off-by: Killian Golds <[email protected]>
|
Skipping CI for Draft Pull Request. |
|
@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: KillianGolds The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@KillianGolds: This pull request references RHOAIENG-34472 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
- Add processConfigTemplates() to pre-process Go templates in config files before loading into envtest (fixes CRD regex validation) - Update multi-node tests to use SimpleWorkerPodSpec() instead of empty PodSpec (required containers field) - Add required containers to storage test PodSpec fixtures - Fix HTTPRoute deletion test to expect cleared condition (not True) - Fix stop test duplicate IstioShadowService creation Signed-off-by: Killian Golds <[email protected]>
Signed-off-by: Killian Golds <[email protected]>
The GIE v1.1.0 migration requires Go 1.24.7 due to dependency chain: - controller-runtime v0.22+ -> KEDA v2.18.0 -> Go 1.24.7 The UBI go-toolset:1.24 image only has Go 1.24.6, so switch to the official golang:1.24.7 image for the builder stage. Signed-off-by: Killian Golds <[email protected]>
7f96f85 to
4fdc1d4
Compare
Delete 16 auto-generated test files with invalid Python syntax due to OpenAPI generator bug. Add them to .openapi-generator-ignore to prevent regeneration on make precommit. Signed-off-by: Killian Golds <[email protected]>
|
/test e2e-llm-inference-service |
|
@KillianGolds: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Overview
This PR migrates KServe's LLMInferenceService to Gateway API Inference Extension (GIE) v1.1.0, implementing a dual-pool architecture for zero-downtime backward compatibility.
Upstream Challenges & Contributions:
During this migration, critical bugs were discovered in upstream dependencies that required fixes:
Major Dependency Upgrades (merged from upstream during development):
Updates KServe's LLMInferenceService from GIE v1alpha2 to v1.1.0 with:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes # RHOAIENG-34472
Key Files for Review
API Layer
pkg/apis/serving/v1alpha2/- New v1alpha2 API types (storage version)pkg/apis/serving/v1alpha1/llm_inference_service_conversion.go- v1alpha1↔v1alpha2 conversionpkg/apis/serving/v1alpha2/llm_inference_service_defaults.go- v1alpha2 defaulting logicController Layer
pkg/controller/llmisvc/scheduler.go- Dual InferencePool creation (GIE v1 + v1alpha2)pkg/controller/llmisvc/router.go- Updated for v1alpha2 types and dual pool status checkspkg/controller/llmisvc/controller.go- Main reconciliation loop updatesConfiguration
config/crd/full/serving.kserve.io_llminferenceservices.yaml- Multi-version CRDconfig/crd/full/llmisvc_conversion_patch.yaml- Conversion webhook configconfig/default/llmisvc_cainjection_conversion_webhook.yaml- CA injection for conversionRegistration & Dependencies
cmd/manager/main.go- v1alpha2 API and webhook registrationgo.mod- GIE v1.1.0 dependency (sigs.k8s.io/gateway-api-inference-extension v1.1.0)Tests
pkg/controller/llmisvc/controller_int_stop_test.go- Integration tests for stop feature with v1alpha2test/e2e/llmisvc/- E2E tests updated for dual API version support🏗️ Architecture
Dual-Pool Strategy
API Version Flow
Migration Behavior
One-Way Migration Logic
Initial State: LLMInferenceService created
Migration Trigger: v1 InferencePool becomes Ready
serving.kserve.io/inference-pool-migrated: v1Final State:
Feature/Issue validation/testing
Unit & Integration Tests (
make test)End-to-End Tests (
test/e2e/llmisvc/)Test Updates
Integration tests were updated to accommodate stricter v1alpha2 CRD validation (required
containersfield in PodSpec fixtures).Notes for reviewers
Integration test fixtures updated - Tests now include required
containersfield in PodSpec to satisfy stricter v1alpha2 CRD validation. This is not a behavioral change, just test fixture compliance.HTTPRoute deletion test expectation corrected - The test previously expected
HTTPRoutesReady=Truewhen Router is set to nil. The correct behavior (which the controller implements) is to clear the condition entirely. Test now verifies the condition is nil.Dual InferencePool support in router - The router now checks both
inference.networking.k8s.io(v1) andinference.networking.x-k8s.io(v1alpha2) InferencePool APIs, enabling gradual migration between GIE versions.Route config uses weighted backends -
config-llm-router-route.yamlconfigures both v1 (weight: 0) and v1alpha2 (weight: 100) backends, allowing traffic migration once v1 pools become ready.Checklist:
Release note:
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.