-
Notifications
You must be signed in to change notification settings - Fork 40
[RHOAIENG-38499] - OCI model storage is not working with multi-node f… #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: spolti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThe storage initializer injector now supports multi-container pod configurations for modelcar injection. It prioritizes InferenceService containers, falls back to Worker containers if absent, and conditionally configures Transformer containers. Comprehensive test coverage validates container prioritization, error handling, and multi-container scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
6b74c84 to
7c18959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/webhook/admission/pod/storage_initializer_injector_test.go (1)
4457-4555: Good multi-node coverage; consider tightening assertions and commentsThe new multi-node tests and helpers nicely cover:
- Worker-only pods getting modelcar + volume mounts.
- Explicit error when neither kserve nor worker container exists.
- Preference for
kserve-containerwhen both kserve and worker are present.- Worker + transformer both receiving mounts when there’s no kserve container.
Two small polish suggestions:
- In the
"prioritizes kserve-container over worker-container"subtest, the comment says “Both containers should have volume mounts but kserve-container should be prioritized” while the assertions only verify the mount onkserve-container. Either add an explicit check for the worker’s mounts (or lack thereof, depending on the intended behavior) or reword the comment so it matches what is actually asserted.- If you want parity with the single-node tests, you might also assert invariants like
ShareProcessNamespaceor the model-init env var in at least one multi-node case, but that’s optional.Functionally this suite looks solid and should catch regressions in the new selection logic.
Also applies to: 4559-4619
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
config/default/manager_image_patch.yaml(1 hunks)pkg/webhook/admission/pod/storage_initializer_injector.go(1 hunks)pkg/webhook/admission/pod/storage_initializer_injector_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/webhook/admission/pod/storage_initializer_injector_test.go (5)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)
StorageInitializerInjector(48-52)pkg/types/config.go (1)
StorageInitializerConfig(19-32)pkg/utils/utils.go (1)
GetContainerWithName(249-256)pkg/constants/constants.go (7)
ModelcarContainerName(408-408)WorkerContainerName(405-405)StorageInitializerVolumeName(186-186)InferenceServiceContainerName(398-398)TransformerContainerName(402-402)StorageInitializerSourceUriInternalAnnotationKey(135-135)OciURIPrefix(181-181)pkg/apis/serving/v1beta1/podspec.go (1)
PodSpec(28-339)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test
- GitHub Check: Test
- GitHub Check: build (3.11)
- GitHub Check: build (3.12)
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (1)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)
95-112: Multi-node modelcar container selection logic looks soundThe updated
InjectModelcarcorrectly:
- Prefers
constants.InferenceServiceContainerNamewhen present.- Falls back to
constants.WorkerContainerNamewhen the inference container is absent, matching the pattern already used inInjectStorageInitializer.- Returns a clear configuration error only when neither container exists.
- Optionally configures the transformer container when present.
This matches the intended multi-node OCI behavior and aligns with the new tests that cover worker-only, no-valid-container, and worker+transformer cases.
Also applies to: 114-117
…eature chore: Fix the scenarion when using OCI model cache with multi-node feature where the issue happens: ` denied the request: no container found with name kserve-container` Signed-off-by: Spolti <[email protected]>
7c18959 to
7b8a63d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)
95-117: Worker-container fallback inInjectModelcarlooks correct; consider clarifying error messageThe new fallback to
constants.WorkerContainerNamewhenkserve-containeris absent is consistent with theInjectStorageInitializerbehavior and should resolve the multi-node/worker-only scenario without impacting existing single-node behavior. The only nit is that the error still reports onlykserve-containereven though both the inference and worker containers are now considered valid targets; if you expect operators to debug worker-only pods directly, consider mentioning both acceptable container names in the error text.pkg/webhook/admission/pod/storage_initializer_injector_test.go (1)
4457-4555: Multi-nodeInjectModelcartests cover key paths; minor comment/assertion mismatchThe new multi-node tests and helpers exercise the worker-only, no-valid-container, inference+worker, and worker+transformer cases and align well with the updated injector logic. In the “prioritizes kserve-container over worker-container” subtest, the comment says both containers should have volume mounts but the code only asserts the mount on
kserve-container; either add an assertion for the worker container or relax the comment to match the actual expectation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/webhook/admission/pod/storage_initializer_injector.go(1 hunks)pkg/webhook/admission/pod/storage_initializer_injector_test.go(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Test
- GitHub Check: build (3.10)
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: build (3.9)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
…eature
chore: Fix the scenarion when using OCI model cache with multi-node feature where the
issue happens:
denied the request: no container found with name kserve-containerWhat this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Test B
Logs
Special notes for your reviewer:
Checklist:
Release note:
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.Summary by CodeRabbit
New Features
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.