Skip to content

Conversation

@mholder6
Copy link

@mholder6 mholder6 commented Nov 26, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?
  • Have you linked the JIRA issue(s) to this PR?

Release note:


Re-running failed tests

  • /rerun-all - rerun all failed workflows.
  • /rerun-workflow <workflow name> - rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.

Summary by CodeRabbit

  • New Features

    • Added explicit end-to-end tests for explainer component across multiple test flows.
    • Introduced new test utilities for HTTPRoute validation.
  • Bug Fixes

    • Improved storage error handling with better error messages and context manager usage for archive extraction.
  • Refactor

    • Separated storage functionality into an independent package; storage imports now use kserve_storage module.
    • Restructured container configurations with simplified startup logic and unified deployment templates.
  • Chores

    • Updated container image versions and enhanced security context settings across templates.
    • Relaxed adapter schema validation to support flexible configurations.
    • Updated workflow configurations for improved test coverage and security scanning.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci
Copy link

openshift-ci bot commented Nov 26, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mholder6

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Nov 26, 2025

Walkthrough

This PR refactors the storage module into a standalone package, updates CRD schemas to accept flexible adapter configurations, refreshes LLM container images with enhanced security contexts, separates E2E tests for the explainer component, and adds storage module installation to multiple Dockerfiles.

Changes

Cohort / File(s) Summary
Configuration & Linting
.flake8
Added pattern to exclude python/kserve/build/**/*.py from Flake8 checks
CI/CD Workflows
.github/workflows/e2e-test.yml
Split explainer E2E tests into separate steps for transformer/mms/collocation flow and path-based routing flow
CI/CD Workflows
.github/workflows/python-test.yml
Added storage-related tests to Python test workflow with pytest coverage
CI/CD Workflows
.github/workflows/scheduled-image-scan.yml
Added fail-fast: false to matrix strategies, replaced continue-on-error with category labels for SARIF uploads
CRD Schemas
charts/llmisvc-crd/templates/serving.kserve.io_llminferenceservice*.yaml, config/crd/full/serving.kserve.io_llminferenceservice*.yaml, test/crds/serving.kserve.io_all_crds.yaml
Replaced strict adapter schema with x-kubernetes-preserve-unknown-fields: true to allow flexible adapter configurations
LLM Configuration Templates
charts/llmisvc-resources/templates/config-llm-*.yaml, config/llmisvc/config-llm-*.yaml
Updated container images (v0.2.0 → v0.2.2-dev), added /mnt/models mount, enhanced security contexts with readOnlyRootFilesystem/runAsNonRoot/capabilities/seccomp settings, increased probe timings, added connector flag, unified vllm serve logic for template/worker split
LLM Router Configuration
charts/llmisvc-resources/templates/config-llm-router-route.yaml, config/llmisvc/config-llm-router-route.yaml
Minor trailing newline adjustment
LLM Scheduler Configuration
charts/llmisvc-resources/templates/config-llm-scheduler.yaml, config/llmisvc/config-llm-scheduler.yaml
Added --modelServerMetricsScheme https and --modelServerMetricsHttpsInsecureSkipVerify arguments
Kustomization
config/llmisvc/kustomization.yaml
Re-added config-llm-worker-data-parallel.yaml to resources list
Go API Types & Defaults
pkg/apis/serving/v1alpha1/llm_inference_service_types.go, pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go, pkg/apis/serving/v1alpha1/zz_generated.deepcopy.go
Changed LoRASpec.Adapters from []ModelSpec to []LLMModelSpec, updated SetDefaults to accept and propagate context, initialized nil container slices
Config Merge Logic
pkg/controller/v1alpha1/llmisvc/config_merge.go, pkg/controller/v1alpha1/llmisvc/config_merge_test.go
Updated MergeSpecs signature to accept context.Context, added context propagation to SetDefaults calls, added isDefaultBackendRef helper, updated test expectations for new adapter types
Config Presets Tests
pkg/controller/v1alpha1/llmisvc/config_presets_test.go
Updated spec references from Worker to Template in test expectations
Lifecycle & CRUD
pkg/controller/v1alpha1/llmisvc/lifecycle_crud.go
Added nil checks for owner in Create/Update/Delete operations, improved log line formatting with string trimming
Generated OpenAPI
pkg/openapi/openapi_generated.go
Removed empty/whitespace-only lines for cleanup
Test Utilities
pkg/testing/httproute_matchers.go, pkg/testing/log.go
Added new Gomega matchers for HTTPRoute validation (gateway/backend refs) and test logger utility
Python Storage Module Extraction
python/kserve/kserve/storage/__init__.py, python/kserve/kserve/storage/storage.py, python/kserve/kserve/storage/test/test_*.py
Removed storage module from kserve (files deleted: __init__.py, storage.py, test_*_storage.py, test_storage.py)
Python Server Dockerfiles
python/huggingface_server.Dockerfile, python/huggingface_server_cpu.Dockerfile, python/lgb.Dockerfile, python/paddle.Dockerfile, python/pmml.Dockerfile, python/sklearn.Dockerfile, python/xgb.Dockerfile
Added storage module build and copy steps in builder and production stages
Python Storage Initializer
python/storage-initializer.Dockerfile, python/storage-initializer/scripts/initializer-entrypoint
Replaced kserve references with storage, updated imports to kserve_storage
Python Standalone Storage Package
python/storage/kserve_storage/kserve_storage.py, python/storage/pyproject.toml, python/storage/test/test_gcs_storage.py
Created standalone storage package with improved archive extraction (context managers), updated dependency constraints, fixed GCS test mock structure
Python Server Dependencies
python/huggingfaceserver/pyproject.toml, python/lgbserver/pyproject.toml, python/paddleserver/pyproject.toml, python/pmmlserver/pyproject.toml, python/sklearnserver/pyproject.toml, python/xgbserver/pyproject.toml
Split kserve[storage] dependency into kserve + kserve-storage
Python Server Imports
python/huggingfaceserver/huggingfaceserver/__main__.py, python/lgbserver/lgbserver/model.py, python/paddleserver/paddleserver/model.py, python/pmmlserver/pmmlserver/model.py, python/sklearnserver/sklearnserver/model.py, python/xgbserver/xgbserver/model.py
Updated Storage import from kserve.storage to kserve_storage
Python kserve Makefile & Dependencies
python/kserve/Makefile, python/kserve/pyproject.toml
Removed storage extra from kserve, added separate storage package installation, updated optional dependencies to reference kserve-storage==2
Explainer E2E Tests
test/e2e/explainer/test_art_explainer.py
Removed path_based_routing decorator, added random UUID suffixes to service names for test isolation
Graph E2E Tests
test/e2e/graph/test_inference_graph.py
Simplified image version handling with single img_version variable, updated image constants to use f-strings
Documentation Samples
docs/samples/explanation/alibi/alibiexplainer/alibiexplainer/__main__.py, docs/samples/explanation/alibi/alibiexplainer/tests/test_anchor_*.py
Updated Storage import from kserve.storage to kserve_storage

Sequence Diagram(s)

sequenceDiagram
    participant Consumer as Python Server
    participant KServeOld as kserve[storage]
    participant KServeNew as kserve (main)
    participant StorageNew as kserve-storage
    
    Consumer->>KServeOld: import Storage (old flow)
    KServeOld->>KServeOld: from kserve.storage
    
    Note over Consumer,StorageNew: After refactoring:
    
    Consumer->>KServeNew: import kserve
    Consumer->>StorageNew: from kserve_storage import Storage
    StorageNew->>StorageNew: Standalone package
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • Storage module extraction: Verify all import paths are correctly updated across Python packages; check for missed references to kserve.storage in consumer code
  • CRD schema relaxation: Validate that removing strict adapter schema validation doesn't break existing configurations; ensure x-kubernetes-preserve-unknown-fields is the intended approach
  • Type system changes: Review LoRASpec.Adapters type change from ModelSpec to LLMModelSpec across all generated code and tests; verify serialization/deserialization compatibility
  • Context propagation: Ensure context is correctly threaded through MergeSpecs and SetDefaults calls; check for nil context handling
  • LLM config refactoring: Complex container startup logic changes across multiple templates; verify unified vllm serve invocation is functionally equivalent to previous leader/worker branching
  • Security context additions: Confirm security settings don't inadvertently restrict required capabilities (notably the IPC_LOCK and SYS_RAWIO handling in data-parallel configs)

Possibly related PRs

  • 20251118 sync upstream #976: Adds identical .flake8 exclude pattern and contains overlapping upstream sync changes affecting storage module organization

Suggested labels

lgtm, tide/merge-method-merge

Suggested reviewers

  • bartoszmajsak
  • pierDipi
  • spolti

Poem

🐰 Whiskers twitched with delight,
Storage hops to its own right,
Adapters now flow wild and free,
Security contexts set spree,
v0.2.2 brings the spring!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The PR title '20251126 sync upstream' uses a date-based format and generic term 'sync upstream' without clearly conveying the specific changes or main objectives. Replace with a descriptive title that summarizes the key changes, such as 'Refactor storage module to separate package and update LLM service configurations' or similar that captures the primary purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
config/llmisvc/config-llm-worker-data-parallel.yaml (1)

18-183: The repository clone failed, but I can still verify the issues using the code snippet provided in the review comment and applying general principles of shell scripting and Go templating. Let me analyze the identified issues:

Analysis of the three issues:

  1. Broken quoting around --served-model-name: The review shows code like:

    eval "vllm serve \
      /mnt/models \
      --served-model-name "{{ .Spec.Model.Name }}" \

    This is indeed a quoting error. Inside a double-quoted string, the inner " will prematurely terminate the outer quote, breaking the shell syntax.

  2. $(LWS_LEADER_ADDRESS) vs $LWS_LEADER_ADDRESS: The code uses $(LWS_LEADER_ADDRESS) which in shell syntax means "execute command named LWS_LEADER_ADDRESS". This is universally true in POSIX shells - command substitution syntax is standard.

  3. Pointer-valued .Spec.Parallelism.* fields: The review claims these are Go *int32 types. The code snippet shows usage like {{ .Spec.Parallelism.Tensor }} which would render the pointer value itself rather than the dereferenced integer without a helper function. This is a standard limitation of Go templates.

These are fundamental shell and template syntax issues that don't require repository-specific context to verify. The issues identified are conceptually sound based on universal shell and Go templating mechanics.


Shell template has quoting and expansion bugs that will likely break the worker pods.

There are three correctness issues in the new vllm serve shell snippets that are serious enough to prevent the containers from starting correctly:

  1. Broken quoting around --served-model-name (template + worker)
    In both template and worker sections, lines like:

    eval "vllm serve \
      /mnt/models \
      --served-model-name "{{ .Spec.Model.Name }}" \

    inject "{{ .Spec.Model.Name }}" inside an already double‑quoted string, which terminates the outer quote early and produces invalid shell syntax once the template is rendered. This is a hard runtime error.

    Remove the inner quotes:

  •        eval "vllm serve \
    
  •          /mnt/models \
    
  •          --served-model-name "{{ .Spec.Model.Name }}" \
    
  •        eval "vllm serve \
    
  •          /mnt/models \
    
  •          --served-model-name {{ .Spec.Model.Name }} \
    
    
    Apply the same change in the `worker` block. If you need to support model names with spaces, escape the inner quotes instead: `--served-model-name \"{{ .Spec.Model.Name }}\"`.
    
    
  1. $(LWS_LEADER_ADDRESS) is command substitution, not env var expansion
    Lines like:
    --data-parallel-address $(LWS_LEADER_ADDRESS) \
    use POSIX command substitution, which attempts to execute a command named LWS_LEADER_ADDRESS. To use the environment variable, change to:
  •          --data-parallel-address $(LWS_LEADER_ADDRESS) \
    
  •          --data-parallel-address $LWS_LEADER_ADDRESS \
    
    Apply the same change in the `worker` block.
    
    
  1. Pointer-valued .Spec.Parallelism.* fields in templates
    Using Go pointer types directly in templates like:
    {{- if .Spec.Parallelism.Tensor -}}--tensor-parallel-size {{ .Spec.Parallelism.Tensor }}{{- end }}
    --data-parallel-size-local {{ or .Spec.Parallelism.DataLocal 1 }} \
    
    will render pointer values (e.g., memory addresses) rather than integers unless dereferenced. Either add template helper functions to dereference these pointers or change the underlying API fields to non-pointer numeric types.
pkg/controller/v1alpha1/llmisvc/config_presets_test.go (1)

431-434: Remove duplicate yaml.Unmarshal call.

The same yaml.Unmarshal(data, config) is called twice consecutively with identical error handling. This appears to be an accidental duplication.

 func loadConfig(t *testing.T, data []byte, filePath string) *v1alpha1.LLMInferenceServiceConfig {
 	config := &v1alpha1.LLMInferenceServiceConfig{}
 	if err := yaml.Unmarshal(data, config); err != nil {
 		t.Errorf("Failed to unmarshal YAML from %s: %v", filePath, err)
 		return nil
 	}
-	if err := yaml.Unmarshal(data, config); err != nil {
-		t.Errorf("Failed to unmarshal YAML from %s: %v", filePath, err)
-		return nil
-	}
 
 	expectedGroupVersion := v1alpha1.LLMInferenceServiceConfigGVK.GroupVersion().String()
charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml (1)

6-103: Consolidate duplicated template and worker configurations to reduce maintenance burden.

The template section (lines 6–103) and worker section (lines 104–183) contain nearly identical container configurations, with only minor differences:

  • Line 25 (template only): --api-server-count ${VLLM_API_SERVER_COUNT:-8}
  • Line 118 vs. 20: START_RANK calculation differs (worker uses ${LWS_WORKER_INDEX:-0})
  • Line 134 (worker only): --headless flag

This duplicated structure creates a maintenance risk. If you need to refactor these configurations, consider using YAML anchors (& and *), shared templates, or a template engine feature to DRY up the code.

Example using YAML anchors (if supported by your tooling):

_container_config: &container_config
  imagePullPolicy: IfNotPresent
  name: main
  ports:
    - containerPort: 8000
      protocol: TCP
  # ... shared fields ...

template:
  containers:
    - <<: *container_config
      # template-specific overrides
worker:
  containers:
    - <<: *container_config
      # worker-specific overrides

Also applies to: 104-183

♻️ Duplicate comments (5)
config/llmisvc/config-llm-template.yaml (2)

8-8: [Duplicate concern from charts template] Verify dev image and model path breaking change.

Same concerns as charts/llmisvc-resources/templates/config-llm-template.yaml: dev image validity, hardcoded /mnt/models vs. templated model name in args, and ephemeral model-cache storage.

Also applies to: 17-17


38-44: [Duplicate concern] Security: readOnlyRootFilesystem: false combined with runAsNonRoot: true.

Same concern as charts template: consider enabling readOnlyRootFilesystem: true with explicit writable volume mounts for improved security posture.

charts/llmisvc-resources/templates/config-llm-prefill-template.yaml (2)

9-9: [Consistent with main template] Verify dev image and model path breaking change.

Same pattern as main template (File 1): dev image v0.2.2-dev, hardcoded /mnt/models command path, and templated model name in args (line 21). Requires same verification: dev image validity, intentionality of hardcoded path, and model-cache persistence strategy.

Also applies to: 18-18


39-45: [Consistent with main template] Security: readOnlyRootFilesystem: false combined with runAsNonRoot: true.

Same security posture consideration as main template: evaluate enabling readOnlyRootFilesystem: true with explicit writable volume mounts.

config/llmisvc/config-llm-prefill-template.yaml (1)

9-9: [Duplicate] Same concerns as charts/llmisvc-resources/templates/config-llm-prefill-template.yaml.

This config file mirrors the charts template. Refer to earlier comments on dev image validity, model path breaking change, and security context configuration.

Also applies to: 18-18, 39-45

🧹 Nitpick comments (17)
python/lgb.Dockerfile (1)

28-33: Align storage build pattern with kserve and lgbserver.

The storage build uses uv pip install . --no-cache (line 33) while kserve (line 26) and lgbserver (line 40) both use uv sync --active --no-cache after copying their source directories. This inconsistency may indicate a refactoring oversight or an intentional difference that should be clarified.

For consistency and to follow the established pattern in this Dockerfile, consider updating line 33 to use uv sync instead:

 COPY storage storage
-RUN cd storage && uv pip install . --no-cache
+RUN cd storage && uv sync --active --no-cache

If uv pip install . is intentional (e.g., to avoid re-locking dependencies), document the rationale.

pkg/testing/httproute_matchers.go (2)

17-29: Package name testing under pkg/testing can be confusing with stdlib testing

Using package testing for helpers that live in non-_test.go code can be easy to confuse with the standard library testing package when importing both. Consider renaming the package (e.g. testutil or ksrvtesting) or consistently aliasing this import in tests to avoid ambiguity.


31-51: extractHTTPRoute logic looks solid; consider clarifying the error text

The type-switch coverage and nil-pointer handling look correct for HTTPRoute / HTTPRouteSpec (by value and pointer). The only nit is the default error string:

return nil, fmt.Errorf("expected gatewayapi.HTTPRoute gatewayapi.HTTPRouteSpec, but got %T", actual)

You may want to tweak it to read more clearly, e.g.:

- return nil, fmt.Errorf("expected gatewayapi.HTTPRoute gatewayapi.HTTPRouteSpec, but got %T", actual)
+ return nil, fmt.Errorf("expected gatewayapi.HTTPRoute or gatewayapi.HTTPRouteSpec, but got %T", actual)

Purely cosmetic, but it makes debugging mis-typed matcher inputs easier.

.github/workflows/python-test.yml (1)

77-81: Verify storage test environment and coverage target

Good to see storage tests wired into CI. A couple of things to double-check:

  • Reusing the python/kserve/.venv venv is fine as long as it actually has the kserve_storage package and its dependencies installed (either from the in-repo python/storage project or from the pinned kserve-storage extra).
  • Given the implementation lives under kserve_storage, confirm that --cov=storage is the intended coverage target; you may want --cov=kserve_storage (or similar) so coverage is reported against the real package.
python/storage/pyproject.toml (1)

15-16: Confirm huggingface-hub floor and kserve-storage version alignment

Raising the huggingface-hub[hf-transfer] minimum and setting kserve-storage to 0.15.2 look reasonable, but please verify:

  • That all consumers (especially HuggingFace server images) are compatible with huggingface-hub>=0.32.0.
  • How this 0.15.2 version relates to the kserve-storage==2 pin in python/kserve/pyproject.toml; if they are meant to represent the same package lineage, aligning or clearly differentiating these versions will avoid confusion.

Also applies to: 18-27

python/kserve/pyproject.toml (1)

58-66: Revisit kserve-storage pin and temporary uv index configuration

The storage extra now pins kserve-storage==2, while the in-repo storage project is versioned 0.15.2. If these are intended to represent the same package, consider aligning versions or using a clearer constraint so it’s obvious which source (local vs published) is authoritative.

The new [tool.uv] block pointing to https://test.pypi.org/simple/ with a TODO is fine for short-term testing, but please ensure this doesn’t leak into production/official release workflows and plan to switch back to the standard PyPI index (or a controlled internal index) once the temporary flow is no longer needed.

Also applies to: 97-101

.github/workflows/e2e-test.yml (1)

418-423: Good test isolation for explainer component.

Separating explainer E2E tests into a dedicated step improves test organization and makes failures easier to diagnose. The lower parallelism count (1 vs 6) for explainer tests may be intentional if there are fewer test cases.

Consider verifying whether the explainer test suite would benefit from higher parallelism (count > 1) to reduce overall test execution time, especially as the test suite grows.

python/storage/kserve_storage/kserve_storage.py (1)

788-796: Archive extraction improvements with proper resource management.

The use of context managers for both tar and zip extraction is good practice. The tar extraction includes a data filter for security.

Static analysis flagged the use of tarfile.extractall() (S202). While the filter="data" parameter mitigates path traversal risks in Python 3.12+, consider verifying that the minimum Python version requirement (3.9 per line 6) supports this filter. For older Python versions, you may need additional safeguards or path validation.

Apply this verification script to check Python version compatibility:

#!/bin/bash
# Check Python version requirements across storage-related files
rg -n "requires-python" python/storage/pyproject.toml python/*/pyproject.toml
python/huggingface_server.Dockerfile (1)

155-155: Verify ownership of copied storage directory.

Line 154 copies the kserve directory with --chown=kserve:kserve, but line 155 copies the storage directory without explicit ownership. Since the final image runs as user 1000 (kserve), consider adding --chown=kserve:kserve to ensure consistent ownership.

Apply this diff to ensure consistent ownership:

 COPY --from=build ${WORKSPACE_DIR}/kserve kserve
-COPY --from=build ${WORKSPACE_DIR}/storage storage
+COPY --from=build --chown=kserve:kserve ${WORKSPACE_DIR}/storage storage
 COPY --from=build ${WORKSPACE_DIR}/huggingfaceserver huggingfaceserver
config/llmisvc/config-llm-scheduler.yaml (1)

50-66: Validate new scheduler metrics flags and TLS verification posture.

The added args --modelServerMetricsScheme, "https", and --modelServerMetricsHttpsInsecureSkipVerify look reasonable for enabling HTTPS metrics, but:

  • Ensure the llm-d-inference-scheduler:v0.2.0 binary actually supports these flags (otherwise the pod will fail to start).
  • Double-check that disabling TLS verification for model-server metrics is acceptable in your threat model; if not, you may want a variant without ...HttpsInsecureSkipVerify or make it configurable instead of hard-coded.
config/llmisvc/config-llm-decode-template.yaml (1)

75-91: Init routing-sidecar securityContext: consider adding seccomp profile for parity.

You’ve locked down the init container with allowPrivilegeEscalation: false, runAsNonRoot: true, and drop: [ALL], which is good. For consistency with the main container and the broader hardening effort, you may also want to add seccompProfile: { type: RuntimeDefault } here unless there’s a specific reason not to.

charts/llmisvc-resources/templates/config-llm-template.yaml (1)

38-44: Security: readOnlyRootFilesystem: false combined with runAsNonRoot: true.

The container permits filesystem writes despite restricting privilege escalation. If the intent is to protect the root filesystem, consider enabling readOnlyRootFilesystem: true and explicitly mounting only writable volumes (e.g., emptyDir, configMap) at specific paths required by the application.

pkg/controller/v1alpha1/llmisvc/lifecycle_crud.go (1)

157-160: Consider using strings.TrimPrefix for clarity.

strings.Replace with a count of 1 works, but strings.TrimPrefix better expresses the intent of removing a leading character.

 func logLineForObject(obj client.Object) string {
 	// Note: don't use `obj.GetObjectKind()` as it's always empty.
-	return strings.Replace(reflect.TypeOf(obj).String(), "*", "", 1)
+	return strings.TrimPrefix(reflect.TypeOf(obj).String(), "*")
 }
charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml (3)

6-103: Add resource requests and limits for predictable cluster scheduling.

Neither the template nor worker containers define resources.requests or resources.limits. This omission can cause unpredictable cluster behavior, poor bin-packing, and potential node pressure. LLM inference workloads typically require substantial memory and GPU/compute resources.

Add appropriate resource specifications, for example:

securityContext:
  # ... existing security context ...
resources:
  requests:
    memory: "8Gi"
    cpu: "2"
    nvidia.com/gpu: "1"  # if GPU is needed
  limits:
    memory: "16Gi"
    cpu: "4"
    nvidia.com/gpu: "1"

Adjust the values to match your actual workload requirements.

Also applies to: 104-183


49-60: Justify adding privileged capabilities despite hardening efforts.

The security context drops all capabilities (drop: ALL) but selectively adds IPC_LOCK and SYS_RAWIO (lines 54–56, 152–154). While this hardened approach is sound, these are privileged capabilities that deserve documentation. Add an inline comment explaining why they are required (e.g., "IPC_LOCK for memory pinning in vLLM inference, SYS_RAWIO for hardware acceleration").

Also applies to: 147-158


51-51: Clarify the readOnlyRootFilesystem: false setting.

Setting readOnlyRootFilesystem: false (lines 51, 149) somewhat undermines the security hardening achieved by dropping capabilities and enforcing runAsNonRoot: true. If the writable filesystem is necessary for:

  • /home (emptyDir for temporary files),
  • /dev/shm (shared memory),
  • /models (model caching),

document this trade-off with an inline comment. Alternatively, verify that all necessary directories are indeed mounted as writable volumes and the root filesystem itself can remain read-only.

Also applies to: 149-149

config/llmisvc/config-llm-prefill-worker-data-parallel.yaml (1)

1-184: Significant duplication with Helm chart template.

This file is nearly identical to charts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yaml. Having two separate copies of the same configuration increases maintenance burden and risk of configuration drift.

Consider consolidating these configurations or generating one from the other to maintain a single source of truth.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea020d0 and dfd4f0b.

⛔ Files ignored due to path filters (15)
  • python/aiffairness/uv.lock is excluded by !**/*.lock
  • python/artexplainer/uv.lock is excluded by !**/*.lock
  • python/custom_model/uv.lock is excluded by !**/*.lock
  • python/custom_tokenizer/uv.lock is excluded by !**/*.lock
  • python/custom_transformer/uv.lock is excluded by !**/*.lock
  • python/huggingfaceserver/uv.lock is excluded by !**/*.lock
  • python/kserve/uv.lock is excluded by !**/*.lock
  • python/lgbserver/uv.lock is excluded by !**/*.lock
  • python/paddleserver/uv.lock is excluded by !**/*.lock
  • python/pmmlserver/uv.lock is excluded by !**/*.lock
  • python/sklearnserver/uv.lock is excluded by !**/*.lock
  • python/storage/uv.lock is excluded by !**/*.lock
  • python/test_resources/graph/error_404_isvc/uv.lock is excluded by !**/*.lock
  • python/test_resources/graph/success_200_isvc/uv.lock is excluded by !**/*.lock
  • python/xgbserver/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (74)
  • .flake8 (1 hunks)
  • .github/workflows/e2e-test.yml (2 hunks)
  • .github/workflows/python-test.yml (1 hunks)
  • .github/workflows/scheduled-image-scan.yml (4 hunks)
  • charts/llmisvc-crd/templates/serving.kserve.io_llminferenceserviceconfigs.yaml (1 hunks)
  • charts/llmisvc-crd/templates/serving.kserve.io_llminferenceservices.yaml (1 hunks)
  • charts/llmisvc-resources/templates/config-llm-decode-template.yaml (6 hunks)
  • charts/llmisvc-resources/templates/config-llm-decode-worker-data-parallel.yaml (7 hunks)
  • charts/llmisvc-resources/templates/config-llm-prefill-template.yaml (4 hunks)
  • charts/llmisvc-resources/templates/config-llm-prefill-worker-data-parallel.yaml (4 hunks)
  • charts/llmisvc-resources/templates/config-llm-router-route.yaml (1 hunks)
  • charts/llmisvc-resources/templates/config-llm-scheduler.yaml (2 hunks)
  • charts/llmisvc-resources/templates/config-llm-template.yaml (4 hunks)
  • charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml (4 hunks)
  • config/crd/full/serving.kserve.io_llminferenceserviceconfigs.yaml (1 hunks)
  • config/crd/full/serving.kserve.io_llminferenceservices.yaml (1 hunks)
  • config/llmisvc/config-llm-decode-template.yaml (6 hunks)
  • config/llmisvc/config-llm-decode-worker-data-parallel.yaml (7 hunks)
  • config/llmisvc/config-llm-prefill-template.yaml (4 hunks)
  • config/llmisvc/config-llm-prefill-worker-data-parallel.yaml (4 hunks)
  • config/llmisvc/config-llm-router-route.yaml (1 hunks)
  • config/llmisvc/config-llm-scheduler.yaml (2 hunks)
  • config/llmisvc/config-llm-template.yaml (4 hunks)
  • config/llmisvc/config-llm-worker-data-parallel.yaml (4 hunks)
  • config/llmisvc/kustomization.yaml (1 hunks)
  • docs/samples/explanation/alibi/alibiexplainer/alibiexplainer/__main__.py (1 hunks)
  • docs/samples/explanation/alibi/alibiexplainer/tests/test_anchor_images.py (1 hunks)
  • docs/samples/explanation/alibi/alibiexplainer/tests/test_anchor_tabular.py (1 hunks)
  • pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go (1 hunks)
  • pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1 hunks)
  • pkg/apis/serving/v1alpha1/zz_generated.deepcopy.go (1 hunks)
  • pkg/controller/v1alpha1/llmisvc/config_merge.go (11 hunks)
  • pkg/controller/v1alpha1/llmisvc/config_merge_test.go (15 hunks)
  • pkg/controller/v1alpha1/llmisvc/config_presets_test.go (8 hunks)
  • pkg/controller/v1alpha1/llmisvc/lifecycle_crud.go (5 hunks)
  • pkg/openapi/openapi_generated.go (0 hunks)
  • pkg/testing/httproute_matchers.go (1 hunks)
  • pkg/testing/log.go (1 hunks)
  • python/huggingface_server.Dockerfile (2 hunks)
  • python/huggingface_server_cpu.Dockerfile (2 hunks)
  • python/huggingfaceserver/huggingfaceserver/__main__.py (1 hunks)
  • python/huggingfaceserver/pyproject.toml (1 hunks)
  • python/kserve/Makefile (1 hunks)
  • python/kserve/kserve/storage/__init__.py (0 hunks)
  • python/kserve/kserve/storage/storage.py (0 hunks)
  • python/kserve/kserve/storage/test/test_azure_storage.py (0 hunks)
  • python/kserve/kserve/storage/test/test_gcs_storage.py (0 hunks)
  • python/kserve/kserve/storage/test/test_hf_storage.py (0 hunks)
  • python/kserve/kserve/storage/test/test_s3_storage.py (0 hunks)
  • python/kserve/kserve/storage/test/test_storage.py (0 hunks)
  • python/kserve/pyproject.toml (2 hunks)
  • python/lgb.Dockerfile (2 hunks)
  • python/lgbserver/lgbserver/model.py (1 hunks)
  • python/lgbserver/pyproject.toml (1 hunks)
  • python/paddle.Dockerfile (2 hunks)
  • python/paddleserver/paddleserver/model.py (1 hunks)
  • python/paddleserver/pyproject.toml (1 hunks)
  • python/pmml.Dockerfile (2 hunks)
  • python/pmmlserver/pmmlserver/model.py (1 hunks)
  • python/pmmlserver/pyproject.toml (1 hunks)
  • python/sklearn.Dockerfile (2 hunks)
  • python/sklearnserver/pyproject.toml (1 hunks)
  • python/sklearnserver/sklearnserver/model.py (1 hunks)
  • python/storage-initializer.Dockerfile (2 hunks)
  • python/storage-initializer/scripts/initializer-entrypoint (1 hunks)
  • python/storage/kserve_storage/kserve_storage.py (4 hunks)
  • python/storage/pyproject.toml (2 hunks)
  • python/storage/test/test_gcs_storage.py (2 hunks)
  • python/xgb.Dockerfile (2 hunks)
  • python/xgbserver/pyproject.toml (1 hunks)
  • python/xgbserver/xgbserver/model.py (1 hunks)
  • test/crds/serving.kserve.io_all_crds.yaml (2 hunks)
  • test/e2e/explainer/test_art_explainer.py (1 hunks)
  • test/e2e/graph/test_inference_graph.py (1 hunks)
💤 Files with no reviewable changes (8)
  • pkg/openapi/openapi_generated.go
  • python/kserve/kserve/storage/test/test_hf_storage.py
  • python/kserve/kserve/storage/test/test_storage.py
  • python/kserve/kserve/storage/storage.py
  • python/kserve/kserve/storage/test/test_azure_storage.py
  • python/kserve/kserve/storage/init.py
  • python/kserve/kserve/storage/test/test_gcs_storage.py
  • python/kserve/kserve/storage/test/test_s3_storage.py
🧰 Additional context used
🧬 Code graph analysis (14)
docs/samples/explanation/alibi/alibiexplainer/tests/test_anchor_images.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
python/sklearnserver/sklearnserver/model.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
python/xgbserver/xgbserver/model.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
pkg/apis/serving/v1alpha1/zz_generated.deepcopy.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1)
  • LLMModelSpec (118-137)
python/huggingfaceserver/huggingfaceserver/__main__.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
pkg/apis/serving/v1alpha1/llm_inference_service_defaults.go (1)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (3)
  • LLMInferenceService (40-46)
  • LLMInferenceServiceSpec (60-89)
  • WorkloadSpec (92-115)
pkg/testing/httproute_matchers.go (2)
pkg/controller/v1alpha1/llmisvc/fixture/gwapi_builders.go (2)
  • HTTPRoute (99-118)
  • BackendRefs (308-310)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (1)
  • HTTPRouteSpec (184-194)
docs/samples/explanation/alibi/alibiexplainer/alibiexplainer/__main__.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
python/paddleserver/paddleserver/model.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
python/pmmlserver/pmmlserver/model.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
docs/samples/explanation/alibi/alibiexplainer/tests/test_anchor_tabular.py (1)
python/storage/kserve_storage/kserve_storage.py (1)
  • Storage (59-798)
pkg/controller/v1alpha1/llmisvc/config_merge.go (2)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (3)
  • LLMInferenceServiceSpec (60-89)
  • LLMInferenceService (40-46)
  • SchedulerSpec (222-231)
pkg/controller/v1alpha1/llmisvc/fixture/gwapi_builders.go (1)
  • BackendRefs (308-310)
pkg/controller/v1alpha1/llmisvc/config_presets_test.go (1)
pkg/apis/serving/v1beta1/podspec.go (1)
  • PodSpec (28-339)
pkg/controller/v1alpha1/llmisvc/config_merge_test.go (5)
pkg/apis/serving/v1alpha1/llm_inference_service_types.go (6)
  • LLMModelSpec (118-137)
  • LLMInferenceServiceSpec (60-89)
  • WorkloadSpec (92-115)
  • LoRASpec (140-148)
  • LLMInferenceServiceConfig (52-57)
  • LLMInferenceService (40-46)
pkg/apis/serving/v1beta1/podspec.go (1)
  • PodSpec (28-339)
pkg/testing/log.go (1)
  • NewTestLogger (37-39)
pkg/controller/v1alpha1/llmisvc/config_merge.go (2)
  • MergeSpecs (283-298)
  • ReplaceVariables (234-262)
pkg/testing/httproute_matchers.go (1)
  • HaveGatewayRefs (54-58)
🪛 Ruff (0.14.5)
python/storage/kserve_storage/kserve_storage.py

792-792: Uses of tarfile.extractall()

(S202)


794-796: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: precommit-check
  • GitHub Check: Test
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: test

- storageUri
type: object
type: array
x-kubernetes-preserve-unknown-fields: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Script executed:

# First, let's examine the CRD file around line 49 to see the full context
cat -n charts/llmisvc-crd/templates/serving.kserve.io_llminferenceserviceconfigs.yaml | sed -n '40,60p'

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


kserve llminferenceservice preserveUnknownFields adapters schema


<web_search_result>
Short answer: set the CRD schema to preserve unknown fields using the OpenAPI vendor extension (x-kubernetes-preserve-unknown-fields: true) on the appropriate openAPIV3Schema node that covers your adapters/LoRA/adapter-like subobject. The legacy CRD field preserveUnknownFields is deprecated — prefer x-kubernetes-preserve-unknown-fields in spec.versions[*].schema.openAPIV3Schema.

Why/how (concise):

  • KServe exposes LLMInferenceService as a CRD (so schema/pruning is controlled by the CRD’s openAPIV3Schema). [1]
  • Kubernetes supports x-kubernetes-preserve-unknown-fields in openAPIV3Schema to prevent pruning of arbitrary/unknown fields (and notes preserveUnknownFields is deprecated in favor of that extension). Use the extension at the root or the nested property that must remain schemaless. [2][3]

Example (conceptual): inside the CRD’s spec.versions[].schema.openAPIV3Schema place:
type: object
x-kubernetes-preserve-unknown-fields: true
(or add the extension under the specific property for adapters)

References:

  • KServe LLMInferenceService CRD docs. [1]
  • Kubernetes blog/docs on structural schemas and x-kubernetes-preserve-unknown-fields. [2]
  • Kubernetes API/CRD field deprecation note (preserveUnknownFields → x-kubernetes-preserve-unknown-fields). [3]
    </web_search_result>

opendatahub kserve CRD serving.kserve.io llminferenceserviceconfigs


<web_search_result>
Quick summary — what it is and where to find it

  • LLMInferenceServiceConfig is a KServe CRD (API group serving.kserve.io, v1alpha1) that provides reusable/template configuration for LLMInferenceService resources (i.e., base settings a concrete LLMInferenceService can inherit). It contains fields such as model spec (URI, LoRA adapters, storage), replicas, parallelism (tensor/pipeline), pod templates (template/worker) for single- or multi-node/disaggregated deployments. [1]

  • OpenDataHub bundles KServe for single-model serving (vLLM/etc.) and uses these KServe CRDs (ServingRuntime, InferenceService, and the newer LLMInferenceService/LLMInferenceServiceConfig) as part of its model-serving platform and documentation. See the OpenDataHub serving and deployment docs for examples and how KServe is installed/used within OpenDataHub. [2][3]

Useful links (reference)

  • KServe CRD API / Control Plane docs (LLMInferenceService and LLMInferenceServiceConfig schema). [1]
  • OpenDataHub: Serving models (KServe integration, runtimes, InferenceService examples). [2]
  • OpenDataHub: Deploying / configuring model-serving platform (installation and runtime notes). [3]

If you want, I can:

  • show an example YAML for an LLMInferenceServiceConfig (based on KServe docs), or
  • show how OpenDataHub deploys a vLLM ServingRuntime and references such configs.
    </web_search_result>

kserve LLMInferenceService adapters framework memory storageUri documentation


<web_search_result>
Summary of relevant KServe docs and notes about LLMInferenceService adapters, memory, and storageUri

  • LLMInferenceService docs (overview + tutorial) — describes the LLMInferenceService CRD, how it integrates llm-d, templates/adapters via spec.template (containers, resources including memory), and router/gateway/scheduler configuration. Example shows memory in container resource requests/limits. [1][2]

  • storageUri / URI handling — KServe supports model artifacts via storage URIs (http/https, s3, gcs, hf://, etc.) and documents using storageUri with secrets for headers; used by the storage initializer to download model artifacts. Example and usage notes are in the URI docs. [3]

  • Multiple storageUri / adapters (LoRA / adapters) — KServe currently supports a single storageUri per model; there is an open feature request to support multiple storageUri (for use cases like base model + LoRA adapters coming from different buckets). That issue is tracked on GitHub (Support multiple StorageUri in Inference Service kserve/kserve#3413). [4]

Practical implications

  • To set memory for LLM pods, specify resource.requests/limits.memory under spec.template.containers in LLMInferenceService (as shown in the LLM examples). [1][2]
  • For adapter/LoRA workflows where base model and adapters live separately, you’ll need a workaround (e.g., build a single artifact combining base+adapter or use init containers/custom storage init logic) until multi-storageUri support is implemented; follow or contribute to the GitHub issue for progress. [3][4]

Links (docs referenced)

  • LLMInferenceService overview & examples. [1][2]
  • storageUri / URI docs (storage initializer usage). [3]
  • GitHub issue tracking multiple storageUri feature. [4]

If you want, I can:

  • show an example LLMInferenceService YAML highlighting memory and storageUri fields, or
  • suggest a concrete workaround (initContainer, sidecar, or fused artifact) for using separate adapter files with a base model.
    </web_search_result>

site:github.com opendatahub kserve llminferenceservice adapter schema


<web_search_result>
I searched GitHub for KServe / OpenDataHub references to LLMInferenceService and adapter/schema. Relevant results:

  • KServe main repo (contains InferenceService / LLM work and CRD issues). [1]
  • Issue proposing/supporting OpenAI schema for KServe LLM runtime (generate/OpenAI-compatible API). [2]
  • KServe milestone and recent LLM-related issues (DistributedInferenceService / LLMInferenceService work). [3]
  • OpenDataHub operator releases that reference bundled KServe/ModelMesh runtime-adapter versions. [4]

References
[1] https://github.com/kserve/kserve
[2] kserve#3419
[3] kserve#4433
[4] https://github.com/opendatahub-io/opendatahub-operator/releases

If you want, I can open specific files (CRD schemas, adapter code, or the LLMInferenceService API) and list the exact schema fields or file paths. Which would you like me to fetch?
</web_search_result>


"x-kubernetes-preserve-unknown-fields" CRD adapter schema benefits use cases


<web_search_result>
Summary — what x-kubernetes-preserve-unknown-fields does

  • It's an OpenAPI vendor extension you place on an object-schema (or nested properties/additionalProperties) to tell the API server: do NOT prune/strip fields that are not described in the structural schema for that node. [1][2]

Benefits

  • Allows CRDs to carry arbitrary or opaque JSON blobs (runtime.RawExtension–like data) inside a structural schema. [1]
  • Enables backward/forward compatibility when controllers/operators may add new fields that not-yet-present clients or CRD schemas would otherwise prune. [1][3]
  • Lets you combine structural validation for most of a resource while permitting “escape hatches” for extensible subtrees (plugins, vendor extensions, unstructured data). [1][2]

Common use cases

  • Fields that must accept arbitrary JSON (e.g., embedded 3rd‑party resource specs, plugin configuration, policy blobs). [1]
  • Migrating legacy non-structural CRDs to structural schemas: mark parts with the extension so existing stored unknown fields survive conversion. [1][3]
  • Controllers that persist dynamic/unknown status or spec keys (adapters, operators that store discovered state). [2][3]
  • Implementing an “embedded resource” pattern (apiVersion/kind/metadata plus opaque body): use together with x-kubernetes-embedded-resource or preserve unknowns inside that subtree. [1]

Caveats / trade-offs

  • Using it weakens pruning/strict structural guarantees for that subtree: you lose automated pruning and some client/server validation assumptions. [1][2]
  • Overuse can reintroduce the problems structural schemas were meant to solve (inconsistent shapes, harder API evolution/testing). Prefer narrow, well-documented escape-hatches. [1][3]
  • For v1 CRDs you still must provide a structural schema overall; preserveUnknownFields at the CRD level is deprecated—prefer the vendor extension on specific schema nodes. [1][3]

References

  • Kubernetes CRD structural schema and vendor extensions (x-kubernetes-preserve-unknown-fields). [1]
  • Kubernetes API concepts: unknown-field pruning behavior. [2]
  • CRD docs: pruning vs preserving unknown fields and migration guidance. [3]
    </web_search_result>

Document the schema flexibility and scope of adapter configuration.

This change uses x-kubernetes-preserve-unknown-fields: true to support extensible adapter schemas—a standard Kubernetes pattern for plugin and custom configuration support. However, the PR lacks documentation of the intent and impact.

Before merging, add to the PR description or a code comment:

  • Why adapter schema is being made flexible (e.g., support for custom/vendor adapter types)
  • Whether framework, memory, storageUri are still documented and validated elsewhere (tests, runtime, controller logic)
  • Any migration guidance for users expecting strict schema validation

Comment on lines +62 to +64
- --modelServerMetricsScheme
- "https"
- --modelServerMetricsHttpsInsecureSkipVerify
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Script executed:

#!/bin/bash
# First, locate and examine the file in question
find . -name "config-llm-scheduler.yaml" -type f | head -20

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


KServe modelServerMetricsHttpsInsecureSkipVerify TLS certificate verification


<web_search_result>
Short answer: That flag (modelServerMetricsHttpsInsecureSkipVerify) — when present in the inferenceservice-config/prometheus-related configuration — tells KServe to skip TLS certificate verification when contacting model servers’ /metrics endpoints over HTTPS. Enabling it disables hostname/CA checks for those metric scrapes (insecure), so the scrapers will accept self-signed/invalid certs. Use with caution; prefer providing a CA bundle (caBundle / caCertFile) instead or properly signed certs. See KServe docs on skipping TLS vs supplying CA bundles and the cluster-wide configmap settings. [1][2]

Sources

  • KServe: Configure CA bundle / skip SSL verification for components (describes using CA bundles and the skip-verify option). [1]
  • KServe: global inferenceservice-config documentation (shows cluster-wide config and examples for TLS-skip options such as tlsSkipVerify). [2]
    </web_search_result>

InsecureSkipVerify production Kubernetes TLS certificate verification best practices security


<web_search_result>
Short answer: Never use InsecureSkipVerify (or equivalent “skip TLS verification” flags) in production. It disables server certificate validation and allows trivial man‑in‑the‑middle, impersonation, and audit‑bypass attacks. (kubernetes.io)

Recommended production best practices

  • Use properly issued TLS certificates (public CA for externally reachable endpoints; internal PKI for private clusters). Distribute the issuing CA to clients instead of skipping verification. (kubernetes.io)
  • Enforce mutual TLS (mTLS) where appropriate so both client and server verify identities. (kubernetes.io)
  • Pin or validate expected server names and certificate chains (hostname verification + verify chain to a trusted CA). Don’t rely on raw certificate bytes without chain validation. (reddit.com)
  • Use Kubernetes-native TLS/secret management (cert-manager, in‑cluster CA, or your PKI) and automate certificate issuance/rotation. Validate certs through CI/CD and policy checks. (kubernetes.io)
  • Restrict network access to control‑plane endpoints (network policies, firewall rules, private API endpoints) so verified TLS is combined with network controls. (kubernetes.io)
  • Audit, logging and alerting: enable API server auditing and monitor for clients using insecure flags or unexpected certificate errors. (kubernetes.io)
  • Fail builds/helm charts or block deploys that include insecure flags (InsecureSkipVerify, skip-tls-verify, tls.insecureSkipVerify) via policy engines (e.g., OPA/Gatekeeper). (gateway.envoyproxy.io)

If you think you must temporarily skip verification (development/testing only)

  • Limit scope: only in dev/test namespaces or environments isolated from production. (gateway.envoyproxy.io)
  • Compensate with strict network isolation and short lifetimes for test credentials. Log and track any use. (hatchjs.com)
  • Prefer implementing custom VerifyPeerCertificate that re‑implements validation correctly (advanced, risky — only if you fully understand certificate verification and supply required trust anchors). (reddit.com)

Operational checklist to eliminate InsecureSkipVerify from production

  1. Inventory code/configs for insecure flags (grep CI, images, helm charts, manifests).
  2. Replace with trusted CA bundles or mTLS configurations.
  3. Automate cert issuance/rotation (cert-manager or PKI).
  4. Add policy checks to CI/CD to block insecure settings.
  5. Harden network access and enable API auditing/alerting. (kubernetes.io)

If you want, I can:

  • scan a repo/manifest set for insecure TLS flags, or
  • give concrete kubectl/helm/cert-manager examples to replace InsecureSkipVerify.

Citations:


Based on my verification, the security concern in the original review comment is legitimate and accurate. Industry best practices (Kubernetes.io, KServe documentation, and security standards) explicitly recommend against using InsecureSkipVerify-style flags in production, as they disable certificate validation and enable man-in-the-middle attacks.

Here is the rewritten review comment:


Remove or conditionally scope --modelServerMetricsHttpsInsecureSkipVerify for production.

The scheduler disables TLS certificate verification for model server metrics over HTTPS. This creates a security vulnerability in production by allowing trivial man-in-the-middle attacks. Kubernetes and KServe best practices explicitly recommend against this.

Recommended fixes:

  1. Provide a CA bundle via KServe's caBundle / caCertFile configuration instead, or
  2. Use properly issued TLS certificates and enable verification, or
  3. If this flag must remain (development/testing only), restrict it to dev/test namespaces via Helm values and document the exception.
              - --modelServerMetricsScheme
              - "https"
              - --modelServerMetricsHttpsInsecureSkipVerify

Reference: KServe TLS configuration documentation.

🤖 Prompt for AI Agents
charts/llmisvc-resources/templates/config-llm-scheduler.yaml lines 62-64: the
template currently passes --modelServerMetricsHttpsInsecureSkipVerify which
disables TLS certificate verification; remove this flag from the production
template and instead (a) configure and mount a CA bundle or set the KServe
caCertFile/caBundle so metric scraping uses valid certs, or (b) require properly
issued TLS certificates on the model server so verification succeeds, and if you
must keep the flag for development only, add a Helm value (e.g.,
metrics.insecureSkipVerify=false by default) and conditionalize the flag in the
template so it is only rendered for non-production deployments with clear
documentation.

template:
containers:
- image: ghcr.io/llm-d/llm-d:v0.2.0
- image: ghcr.io/llm-d/llm-d-dev:v0.2.2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Script executed:

#!/bin/bash
# Find all image references in the repository to understand versioning pattern
rg "ghcr\.io/llm-d" -n -C 2

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


Script executed:

#!/bin/bash
# Check git history of this specific file to see previous image versions
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
git log -p --follow -S "ghcr.io/llm-d" -- "charts/llmisvc-resources/templates/config-llm-worker-data-parallel.yaml" | head -200

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


Script executed:

#!/bin/bash
# Search for any documentation or configuration about image versions
rg -i "(image|version|dev|release)" charts/llmisvc-resources/ -A 2 -B 2 | head -100

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


Script executed:

#!/bin/bash
# Look for values.yaml or default configurations that might specify image versions
find charts/llmisvc-resources -name "values.yaml" -o -name "Chart.yaml" -o -name "README*" | xargs cat 2>/dev/null

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


ghcr.io llm-d llm-d-dev image versions releases


<web_search_result>
Summary — llm-d images and releases (GHCR)

  • Official project release: llm-d v0.3.0 (release date Oct 10, 2025). Component images published under ghcr.io/llm-d (examples: ghcr.io/llm-d/llm-d-cuda:v0.3.0, ghcr.io/llm-d/llm-d-gke:v0.3.0). [1]

  • Older published image tag example: ghcr.io/llm-d/llm-d:v0.2.0 (image metadata / digest available). [2]

  • Many subprojects use a "dev" image tag for local/dev builds (e.g., ghcr.io/llm-d/llm-d-inference-sim:dev) and documentation notes replacing :dev with the current release tag for stable usage. [3]

If you want, I can:

  • list all GHCR image tags for a specific llm-d repository (requires querying the registry), or
  • fetch the full release changelog and per-repo image tags. Which would you prefer?

Sources:
[1] llm-d GitHub Releases (v0.3.0).
[2] ghcr.io/llm-d/llm-d:v0.2.0 image metadata mirror.
[3] llm-d/llm-d-inference-sim README (notes about :dev and release tags).
</web_search_result>


Use a stable release tag instead of the -dev variant, or confirm this is intentional staging.

The web search confirms that the llm-d project documentation explicitly recommends replacing :dev image tags with current release tags for stable deployments. The -dev variants are designated for local/development builds.

Additionally, the project has published a v0.3.0 stable release (Oct 2025), which is newer than the v0.2.2 variant referenced here. Using ghcr.io/llm-d/llm-d-dev:v0.2.2 in a production Kubernetes template deviates from project best practices and introduces risk of using development-stage code.

Recommended action: Switch to a stable release tag (e.g., ghcr.io/llm-d/llm-d:v0.2.2 or consider upgrading to v0.3.0), or explicitly document this as a staging/testing deployment.

Also applies to: 106-106

Comment on lines +1250 to +1252
{
name: "merge pod spec, add container",
cfgs: []v1alpha1.LLMInferenceServiceSpec{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate test case name.

Test case at line 1251 has the same name "merge pod spec, add container" as the test case at line 1190. This can cause confusion when debugging test failures and may mask issues if only one is failing.

 		{
-			name: "merge pod spec, add container",
+			name: "merge pod spec, add container with empty override",
 			cfgs: []v1alpha1.LLMInferenceServiceSpec{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
name: "merge pod spec, add container",
cfgs: []v1alpha1.LLMInferenceServiceSpec{
{
name: "merge pod spec, add container with empty override",
cfgs: []v1alpha1.LLMInferenceServiceSpec{
🤖 Prompt for AI Agents
In pkg/controller/v1alpha1/llmisvc/config_merge_test.go around lines 1250-1252
(the test case starting at line 1251), there is a duplicate test case name
"merge pod spec, add container" which is identical to the one at line 1190;
rename this test case to a unique, descriptive name (e.g., "merge pod spec, add
container - new container" or "merge pod spec, add container (with
initContainer)") so t.Run outputs are distinct, and update any related
expectations or references that rely on the test name.

Comment on lines 67 to 71
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder paddleserver paddleserver
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent --chown flags on copied directories.

Lines 67-68 apply --chown=kserve:kserve to third_party and $VIRTUAL_ENV, but lines 69-70 omit the flag for kserve and storage. Since the container runs as non-root user kserve (line 65), all directories should consistently use --chown=kserve:kserve to avoid permission issues at runtime.

Apply this diff to add missing --chown flags:

 COPY --from=builder --chown=kserve:kserve third_party third_party
 COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
-COPY --from=builder kserve kserve
-COPY --from=builder storage storage
+COPY --from=builder --chown=kserve:kserve kserve kserve
+COPY --from=builder --chown=kserve:kserve storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder paddleserver paddleserver
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder --chown=kserve:kserve kserve kserve
COPY --from=builder --chown=kserve:kserve storage storage
COPY --from=builder paddleserver paddleserver
🤖 Prompt for AI Agents
In python/paddle.Dockerfile around lines 67 to 71, the COPY lines for kserve,
storage, and paddleserver omit the --chown=kserve:kserve flag while other copies
set it, causing inconsistent ownership for a container that runs as non-root
kserve; update the COPY commands for kserve, storage, and paddleserver to
include --chown=kserve:kserve so all copied directories/files are owned by
kserve:kserve and avoid runtime permission issues.

Comment on lines 71 to +81
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent --chown flags on copied directories.

Line 78-79 applies --chown=kserve:kserve to third_party and $VIRTUAL_ENV, but lines 80-81 omit the flag for kserve and storage. Since the container runs as non-root user kserve (line 84), all directories should consistently use --chown=kserve:kserve to avoid permission issues at runtime.

Apply this diff to add missing --chown flags:

 COPY --from=builder --chown=kserve:kserve third_party third_party
 COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
-COPY --from=builder kserve kserve
-COPY --from=builder storage storage
+COPY --from=builder --chown=kserve:kserve kserve kserve
+COPY --from=builder --chown=kserve:kserve storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder --chown=kserve:kserve kserve kserve
COPY --from=builder --chown=kserve:kserve storage storage
🤖 Prompt for AI Agents
In python/pmml.Dockerfile around lines 78 to 81, the COPY instructions are
inconsistent: third_party and $VIRTUAL_ENV use --chown=kserve:kserve but kserve
and storage do not, which can cause permission errors when running as non-root
user kserve; update the two COPY lines for kserve and storage to include
--chown=kserve:kserve so all copied directories are owned by kserve:kserve at
build time.

Comment on lines 62 to 66
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder sklearnserver sklearnserver
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent --chown flags on copied directories.

Lines 62-63 apply --chown=kserve:kserve to third_party and $VIRTUAL_ENV, but lines 64-65 omit the flag for kserve and storage. Since the container runs as non-root user kserve (line 60), all directories should consistently use --chown=kserve:kserve to avoid permission issues at runtime.

Apply this diff to add missing --chown flags:

 COPY --from=builder --chown=kserve:kserve third_party third_party
 COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
-COPY --from=builder kserve kserve
-COPY --from=builder storage storage
+COPY --from=builder --chown=kserve:kserve kserve kserve
+COPY --from=builder --chown=kserve:kserve storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder sklearnserver sklearnserver
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder --chown=kserve:kserve kserve kserve
COPY --from=builder --chown=kserve:kserve storage storage
COPY --from=builder sklearnserver sklearnserver
🤖 Prompt for AI Agents
In python/sklearn.Dockerfile around lines 62-66, several COPY lines lack the
--chown=kserve:kserve flag causing inconsistent ownership; update the COPY
commands that omit --chown (COPY --from=builder kserve kserve, COPY
--from=builder storage storage, and COPY --from=builder sklearnserver
sklearnserver) to include --chown=kserve:kserve so all copied directories are
owned by the non-root kserve user.

COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add missing --chown flag for consistency.

Line 68 is missing the --chown=kserve:kserve flag that's applied to other copied artifacts (lines 66–67). Since the kserve user is set as the active user at line 77, the storage directory should have consistent ownership to avoid potential permission issues at runtime.

-COPY --from=builder storage storage
+COPY --from=builder --chown=kserve:kserve storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=builder storage storage
COPY --from=builder --chown=kserve:kserve storage storage
🤖 Prompt for AI Agents
In python/storage-initializer.Dockerfile around line 68, the COPY of the storage
directory from the builder stage lacks the --chown=kserve:kserve flag used for
other artifacts; update that COPY instruction to include --chown=kserve:kserve
so the storage directory is owned by the kserve user (matching lines 66–67) to
avoid permission issues when USER kserve is set later.

COPY storage/pyproject.toml storage/uv.lock storage/
RUN cd storage && uv sync --active --no-cache

COPY storage storage
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove trailing whitespace.

Line 31 has unnecessary trailing whitespace after the COPY instruction.

Apply this diff:

-COPY storage storage 
+COPY storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY storage storage
COPY storage storage
🤖 Prompt for AI Agents
In python/xgb.Dockerfile around line 31, the COPY instruction has trailing
whitespace; remove the extra space at the end of the line so it reads without
any trailing whitespace (ensure the line ends immediately after "COPY storage
storage").

Comment on lines 63 to 67
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder xgbserver xgbserver
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent --chown flags on copied directories.

Lines 63-64 apply --chown=kserve:kserve to third_party and $VIRTUAL_ENV, but lines 65-66 omit the flag for kserve and storage. Since the container runs as non-root user kserve (line 61), all directories should consistently use --chown=kserve:kserve to avoid permission issues at runtime.

Apply this diff to add missing --chown flags:

 COPY --from=builder --chown=kserve:kserve third_party third_party
 COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
-COPY --from=builder kserve kserve
-COPY --from=builder storage storage
+COPY --from=builder --chown=kserve:kserve kserve kserve
+COPY --from=builder --chown=kserve:kserve storage storage
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder kserve kserve
COPY --from=builder storage storage
COPY --from=builder xgbserver xgbserver
COPY --from=builder --chown=kserve:kserve third_party third_party
COPY --from=builder --chown=kserve:kserve $VIRTUAL_ENV $VIRTUAL_ENV
COPY --from=builder --chown=kserve:kserve kserve kserve
COPY --from=builder --chown=kserve:kserve storage storage
COPY --from=builder xgbserver xgbserver
🤖 Prompt for AI Agents
In python/xgb.Dockerfile around lines 63 to 67, the COPY lines are inconsistent:
third_party and $VIRTUAL_ENV use --chown=kserve:kserve but kserve, storage, and
xgbserver do not, which can cause permission issues when running as non-root
kserve; update the COPY instructions for kserve, storage, and xgbserver to
include --chown=kserve:kserve so all copied directories are owned by
kserve:kserve at build time and avoid runtime permission errors.

@openshift-ci
Copy link

openshift-ci bot commented Nov 28, 2025

@mholder6: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-graph 6ca238d link true /test e2e-graph

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: New/Backlog

Development

Successfully merging this pull request may close these issues.

5 participants