Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ exclude =
python/kserve/test/__init__.py,
python/kserve/test/test_knative*.py,
python/kserve/kserve/protocol/grpc/grpc_predict_v2*.py
python/kserve/build/**/*.py
python/*_pb2.py
docs/**/*.py
python/kserve/kserve/protocol/rest/openai/types/openapi.py
12 changes: 11 additions & 1 deletion .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -415,7 +415,12 @@ jobs:
- name: Run E2E tests
timeout-minutes: 30
run: |
./test/scripts/gh-actions/run-e2e-tests.sh "transformer or mms or collocation or explainer" "6"
./test/scripts/gh-actions/run-e2e-tests.sh "transformer or mms or collocation" "6"

- name: Run E2E tests - explainer
timeout-minutes: 30
run: |
./test/scripts/gh-actions/run-e2e-tests.sh "explainer" "1"

- name: Check system status
if: always()
Expand Down Expand Up @@ -583,6 +588,11 @@ jobs:
run: |
./test/scripts/gh-actions/run-e2e-tests.sh "path_based_routing" "6"

- name: Run E2E tests with path-based routing - Explainer
timeout-minutes: 30
run: |
./test/scripts/gh-actions/run-e2e-tests.sh "explainer" "1"

- name: Check system status
if: always()
run: |
Expand Down
5 changes: 5 additions & 0 deletions .github/workflows/python-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,11 @@ jobs:
cd python
source kserve/.venv/bin/activate
pytest --cov=kserve ./kserve
- name: Test kserve Storage
run: |
cd python
source kserve/.venv/bin/activate
pytest --cov=storage ./storage

# ----------------------------------------Kserve Numpy 1.x Unit Tests--------------------------------------------
- name: Setup kserve numpy 1-x directory
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/scheduled-image-scan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ jobs:
name: scan images
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
image:
[
Expand Down Expand Up @@ -53,15 +54,16 @@ jobs:

- name: Upload sarif file to Github Code Scanning
if: always()
continue-on-error: true #avoid fail the pipeline if the SARIF upload fails.
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: application/${{ matrix.image.name }}/docker.snyk.sarif
category: ${{ matrix.image.name }}

predictor-image-scan:
name: scan predictor images
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
image:
[
Expand Down Expand Up @@ -104,11 +106,13 @@ jobs:
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: application/${{ matrix.image.name }}/docker.snyk.sarif
category: ${{ matrix.image.name }}

explainer-image-scan:
name: scan explainer images
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
image: [{ name: art-explainer, file: python/artexplainer.Dockerfile }]

Expand Down Expand Up @@ -143,3 +147,4 @@ jobs:
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: application/${{ matrix.image.name }}/docker.snyk.sarif
category: ${{ matrix.image.name }}
Original file line number Diff line number Diff line change
Expand Up @@ -46,24 +46,7 @@ spec:
lora:
properties:
adapters:
items:
properties:
framework:
type: string
memory:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
storageUri:
type: string
required:
- framework
- memory
- storageUri
type: object
type: array
x-kubernetes-preserve-unknown-fields: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Script executed:

# First, let's examine the CRD file around line 49 to see the full context
cat -n charts/llmisvc-crd/templates/serving.kserve.io_llminferenceserviceconfigs.yaml | sed -n '40,60p'

Repository: opendatahub-io/kserve


<shell_result>
Repository: opendatahub-io/kserve
Script execution failed with unrecoverable error: [CAPABILITY:RUN_SHELL_SCRIPT] Failed to clone repository opendatahub-io/kserve: Clone operation failed
</shell_result>


kserve llminferenceservice preserveUnknownFields adapters schema


<web_search_result>
Short answer: set the CRD schema to preserve unknown fields using the OpenAPI vendor extension (x-kubernetes-preserve-unknown-fields: true) on the appropriate openAPIV3Schema node that covers your adapters/LoRA/adapter-like subobject. The legacy CRD field preserveUnknownFields is deprecated — prefer x-kubernetes-preserve-unknown-fields in spec.versions[*].schema.openAPIV3Schema.

Why/how (concise):

  • KServe exposes LLMInferenceService as a CRD (so schema/pruning is controlled by the CRD’s openAPIV3Schema). [1]
  • Kubernetes supports x-kubernetes-preserve-unknown-fields in openAPIV3Schema to prevent pruning of arbitrary/unknown fields (and notes preserveUnknownFields is deprecated in favor of that extension). Use the extension at the root or the nested property that must remain schemaless. [2][3]

Example (conceptual): inside the CRD’s spec.versions[].schema.openAPIV3Schema place:
type: object
x-kubernetes-preserve-unknown-fields: true
(or add the extension under the specific property for adapters)

References:

  • KServe LLMInferenceService CRD docs. [1]
  • Kubernetes blog/docs on structural schemas and x-kubernetes-preserve-unknown-fields. [2]
  • Kubernetes API/CRD field deprecation note (preserveUnknownFields → x-kubernetes-preserve-unknown-fields). [3]
    </web_search_result>

opendatahub kserve CRD serving.kserve.io llminferenceserviceconfigs


<web_search_result>
Quick summary — what it is and where to find it

  • LLMInferenceServiceConfig is a KServe CRD (API group serving.kserve.io, v1alpha1) that provides reusable/template configuration for LLMInferenceService resources (i.e., base settings a concrete LLMInferenceService can inherit). It contains fields such as model spec (URI, LoRA adapters, storage), replicas, parallelism (tensor/pipeline), pod templates (template/worker) for single- or multi-node/disaggregated deployments. [1]

  • OpenDataHub bundles KServe for single-model serving (vLLM/etc.) and uses these KServe CRDs (ServingRuntime, InferenceService, and the newer LLMInferenceService/LLMInferenceServiceConfig) as part of its model-serving platform and documentation. See the OpenDataHub serving and deployment docs for examples and how KServe is installed/used within OpenDataHub. [2][3]

Useful links (reference)

  • KServe CRD API / Control Plane docs (LLMInferenceService and LLMInferenceServiceConfig schema). [1]
  • OpenDataHub: Serving models (KServe integration, runtimes, InferenceService examples). [2]
  • OpenDataHub: Deploying / configuring model-serving platform (installation and runtime notes). [3]

If you want, I can:

  • show an example YAML for an LLMInferenceServiceConfig (based on KServe docs), or
  • show how OpenDataHub deploys a vLLM ServingRuntime and references such configs.
    </web_search_result>

kserve LLMInferenceService adapters framework memory storageUri documentation


<web_search_result>
Summary of relevant KServe docs and notes about LLMInferenceService adapters, memory, and storageUri

  • LLMInferenceService docs (overview + tutorial) — describes the LLMInferenceService CRD, how it integrates llm-d, templates/adapters via spec.template (containers, resources including memory), and router/gateway/scheduler configuration. Example shows memory in container resource requests/limits. [1][2]

  • storageUri / URI handling — KServe supports model artifacts via storage URIs (http/https, s3, gcs, hf://, etc.) and documents using storageUri with secrets for headers; used by the storage initializer to download model artifacts. Example and usage notes are in the URI docs. [3]

  • Multiple storageUri / adapters (LoRA / adapters) — KServe currently supports a single storageUri per model; there is an open feature request to support multiple storageUri (for use cases like base model + LoRA adapters coming from different buckets). That issue is tracked on GitHub (Support multiple StorageUri in Inference Service kserve/kserve#3413). [4]

Practical implications

  • To set memory for LLM pods, specify resource.requests/limits.memory under spec.template.containers in LLMInferenceService (as shown in the LLM examples). [1][2]
  • For adapter/LoRA workflows where base model and adapters live separately, you’ll need a workaround (e.g., build a single artifact combining base+adapter or use init containers/custom storage init logic) until multi-storageUri support is implemented; follow or contribute to the GitHub issue for progress. [3][4]

Links (docs referenced)

  • LLMInferenceService overview & examples. [1][2]
  • storageUri / URI docs (storage initializer usage). [3]
  • GitHub issue tracking multiple storageUri feature. [4]

If you want, I can:

  • show an example LLMInferenceService YAML highlighting memory and storageUri fields, or
  • suggest a concrete workaround (initContainer, sidecar, or fused artifact) for using separate adapter files with a base model.
    </web_search_result>

site:github.com opendatahub kserve llminferenceservice adapter schema


<web_search_result>
I searched GitHub for KServe / OpenDataHub references to LLMInferenceService and adapter/schema. Relevant results:

  • KServe main repo (contains InferenceService / LLM work and CRD issues). [1]
  • Issue proposing/supporting OpenAI schema for KServe LLM runtime (generate/OpenAI-compatible API). [2]
  • KServe milestone and recent LLM-related issues (DistributedInferenceService / LLMInferenceService work). [3]
  • OpenDataHub operator releases that reference bundled KServe/ModelMesh runtime-adapter versions. [4]

References
[1] https://github.com/kserve/kserve
[2] kserve#3419
[3] kserve#4433
[4] https://github.com/opendatahub-io/opendatahub-operator/releases

If you want, I can open specific files (CRD schemas, adapter code, or the LLMInferenceService API) and list the exact schema fields or file paths. Which would you like me to fetch?
</web_search_result>


"x-kubernetes-preserve-unknown-fields" CRD adapter schema benefits use cases


<web_search_result>
Summary — what x-kubernetes-preserve-unknown-fields does

  • It's an OpenAPI vendor extension you place on an object-schema (or nested properties/additionalProperties) to tell the API server: do NOT prune/strip fields that are not described in the structural schema for that node. [1][2]

Benefits

  • Allows CRDs to carry arbitrary or opaque JSON blobs (runtime.RawExtension–like data) inside a structural schema. [1]
  • Enables backward/forward compatibility when controllers/operators may add new fields that not-yet-present clients or CRD schemas would otherwise prune. [1][3]
  • Lets you combine structural validation for most of a resource while permitting “escape hatches” for extensible subtrees (plugins, vendor extensions, unstructured data). [1][2]

Common use cases

  • Fields that must accept arbitrary JSON (e.g., embedded 3rd‑party resource specs, plugin configuration, policy blobs). [1]
  • Migrating legacy non-structural CRDs to structural schemas: mark parts with the extension so existing stored unknown fields survive conversion. [1][3]
  • Controllers that persist dynamic/unknown status or spec keys (adapters, operators that store discovered state). [2][3]
  • Implementing an “embedded resource” pattern (apiVersion/kind/metadata plus opaque body): use together with x-kubernetes-embedded-resource or preserve unknowns inside that subtree. [1]

Caveats / trade-offs

  • Using it weakens pruning/strict structural guarantees for that subtree: you lose automated pruning and some client/server validation assumptions. [1][2]
  • Overuse can reintroduce the problems structural schemas were meant to solve (inconsistent shapes, harder API evolution/testing). Prefer narrow, well-documented escape-hatches. [1][3]
  • For v1 CRDs you still must provide a structural schema overall; preserveUnknownFields at the CRD level is deprecated—prefer the vendor extension on specific schema nodes. [1][3]

References

  • Kubernetes CRD structural schema and vendor extensions (x-kubernetes-preserve-unknown-fields). [1]
  • Kubernetes API concepts: unknown-field pruning behavior. [2]
  • CRD docs: pruning vs preserving unknown fields and migration guidance. [3]
    </web_search_result>

Document the schema flexibility and scope of adapter configuration.

This change uses x-kubernetes-preserve-unknown-fields: true to support extensible adapter schemas—a standard Kubernetes pattern for plugin and custom configuration support. However, the PR lacks documentation of the intent and impact.

Before merging, add to the PR description or a code comment:

  • Why adapter schema is being made flexible (e.g., support for custom/vendor adapter types)
  • Whether framework, memory, storageUri are still documented and validated elsewhere (tests, runtime, controller logic)
  • Any migration guidance for users expecting strict schema validation

type: object
name:
type: string
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,24 +65,7 @@ spec:
lora:
properties:
adapters:
items:
properties:
framework:
type: string
memory:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
storageUri:
type: string
required:
- framework
- memory
- storageUri
type: object
type: array
x-kubernetes-preserve-unknown-fields: true
type: object
name:
type: string
Expand Down
20 changes: 16 additions & 4 deletions charts/llmisvc-resources/templates/config-llm-decode-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ metadata:
spec:
template:
containers:
- image: ghcr.io/llm-d/llm-d:v0.2.0
- image: ghcr.io/llm-d/llm-d-dev:v0.2.2
imagePullPolicy: IfNotPresent
name: main
ports:
Expand All @@ -14,6 +14,7 @@ spec:
command:
- vllm
- serve
- /mnt/models
args:
- --served-model-name
- "{{ .Spec.Model.Name }}"
Expand All @@ -34,9 +35,13 @@ spec:
value: /models
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: true
capabilities:
drop:
- MKNOD
- ALL
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
livenessProbe:
Expand Down Expand Up @@ -76,7 +81,13 @@ spec:
- containerPort: 8000
protocol: TCP
resources: { }
securityContext: { }
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: true
capabilities:
drop:
- ALL
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
livenessProbe:
Expand All @@ -100,6 +111,7 @@ spec:
args:
- "--port=8000"
- "--vllm-port=8001"
- "--connector=nixlv2"
- "--secure-proxy=true"
- "--cert-path=/etc/ssl/certs"
- "--decoder-use-tls=true"
Expand Down Expand Up @@ -128,4 +140,4 @@ spec:
name: model-cache
- name: tls-certs
secret:
secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"
secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"
Loading
Loading