Skip to content

Conversation

@spolti
Copy link
Member

@spolti spolti commented Nov 28, 2025

…eature

chore: Fix the scenarion when using OCI model cache with multi-node feature where the
issue happens: denied the request: no container found with name kserve-container

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Summary by CodeRabbit

  • New Features

    • Enhanced model storage initialization to support multi-container pod configurations with intelligent container prioritization.
    • Improved configuration to automatically select appropriate containers when multiple options are available.
  • Bug Fixes

    • Added error handling for scenarios where no valid container is found for model storage configuration.
  • Tests

    • Expanded test coverage for multi-node model storage injection scenarios and container prioritization logic.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: spolti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Nov 28, 2025

Walkthrough

The storage initializer injector now supports multi-container pod configurations for modelcar injection. It prioritizes InferenceService containers, falls back to Worker containers if absent, and conditionally configures Transformer containers. Comprehensive test coverage validates container prioritization, error handling, and multi-container scenarios.

Changes

Cohort / File(s) Summary
Storage Initializer Injection Logic
pkg/webhook/admission/pod/storage_initializer_injector.go
Enhanced InjectModelcar to handle multi-container awareness: prioritizes InferenceService container, falls back to Worker container, and configures Transformer container when present. Returns error if no valid container is found.
Test Suite for Multi-Node Scenarios
pkg/webhook/admission/pod/storage_initializer_injector_test.go
Added TestInjectModelcarMultiNode test suite with subtests validating worker-only injection, container prioritization (kserve over worker), error cases, and combined transformer/worker scenarios. Includes helper Pod constructors for multi-node setup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas requiring attention:
    • Container prioritization logic in InjectModelcar to ensure fallback chain is correct and error handling is appropriate
    • Test assertion accuracy for each multi-node scenario, particularly the container mount validation and prioritization logic
    • Edge cases where containers might be configured unexpectedly or conflict with existing configurations

Poem

🐰 A pod with many containers bright,
We pick the right one—what a sight!
Worker waits when kserve's away,
Transformers join to save the day,
Storage flows through each with care,
Multinode magic floating there!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: fixing OCI model storage functionality for multi-node scenarios, which is the core purpose of the changeset.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/webhook/admission/pod/storage_initializer_injector_test.go (1)

4457-4555: Good multi-node coverage; consider tightening assertions and comments

The new multi-node tests and helpers nicely cover:

  • Worker-only pods getting modelcar + volume mounts.
  • Explicit error when neither kserve nor worker container exists.
  • Preference for kserve-container when both kserve and worker are present.
  • Worker + transformer both receiving mounts when there’s no kserve container.

Two small polish suggestions:

  • In the "prioritizes kserve-container over worker-container" subtest, the comment says “Both containers should have volume mounts but kserve-container should be prioritized” while the assertions only verify the mount on kserve-container. Either add an explicit check for the worker’s mounts (or lack thereof, depending on the intended behavior) or reword the comment so it matches what is actually asserted.
  • If you want parity with the single-node tests, you might also assert invariants like ShareProcessNamespace or the model-init env var in at least one multi-node case, but that’s optional.

Functionally this suite looks solid and should catch regressions in the new selection logic.

Also applies to: 4559-4619

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0eaa31 and 7c18959.

📒 Files selected for processing (3)
  • config/default/manager_image_patch.yaml (1 hunks)
  • pkg/webhook/admission/pod/storage_initializer_injector.go (1 hunks)
  • pkg/webhook/admission/pod/storage_initializer_injector_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/webhook/admission/pod/storage_initializer_injector_test.go (5)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)
  • StorageInitializerInjector (48-52)
pkg/types/config.go (1)
  • StorageInitializerConfig (19-32)
pkg/utils/utils.go (1)
  • GetContainerWithName (249-256)
pkg/constants/constants.go (7)
  • ModelcarContainerName (408-408)
  • WorkerContainerName (405-405)
  • StorageInitializerVolumeName (186-186)
  • InferenceServiceContainerName (398-398)
  • TransformerContainerName (402-402)
  • StorageInitializerSourceUriInternalAnnotationKey (135-135)
  • OciURIPrefix (181-181)
pkg/apis/serving/v1beta1/podspec.go (1)
  • PodSpec (28-339)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test
  • GitHub Check: Test
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.10)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: precommit-check
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (1)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)

95-112: Multi-node modelcar container selection logic looks sound

The updated InjectModelcar correctly:

  • Prefers constants.InferenceServiceContainerName when present.
  • Falls back to constants.WorkerContainerName when the inference container is absent, matching the pattern already used in InjectStorageInitializer.
  • Returns a clear configuration error only when neither container exists.
  • Optionally configures the transformer container when present.

This matches the intended multi-node OCI behavior and aligns with the new tests that cover worker-only, no-valid-container, and worker+transformer cases.

Also applies to: 114-117

…eature

chore:	Fix the scenarion when using OCI model cache with multi-node feature where the
	issue happens: ` denied the request: no container found with name kserve-container`

Signed-off-by: Spolti <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/webhook/admission/pod/storage_initializer_injector.go (1)

95-117: Worker-container fallback in InjectModelcar looks correct; consider clarifying error message

The new fallback to constants.WorkerContainerName when kserve-container is absent is consistent with the InjectStorageInitializer behavior and should resolve the multi-node/worker-only scenario without impacting existing single-node behavior. The only nit is that the error still reports only kserve-container even though both the inference and worker containers are now considered valid targets; if you expect operators to debug worker-only pods directly, consider mentioning both acceptable container names in the error text.

pkg/webhook/admission/pod/storage_initializer_injector_test.go (1)

4457-4555: Multi-node InjectModelcar tests cover key paths; minor comment/assertion mismatch

The new multi-node tests and helpers exercise the worker-only, no-valid-container, inference+worker, and worker+transformer cases and align well with the updated injector logic. In the “prioritizes kserve-container over worker-container” subtest, the comment says both containers should have volume mounts but the code only asserts the mount on kserve-container; either add an assertion for the worker container or relax the comment to match the actual expectation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c18959 and 7b8a63d.

📒 Files selected for processing (2)
  • pkg/webhook/admission/pod/storage_initializer_injector.go (1 hunks)
  • pkg/webhook/admission/pod/storage_initializer_injector_test.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Test
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.9)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: precommit-check
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: New/Backlog

Development

Successfully merging this pull request may close these issues.

1 participant