-
Notifications
You must be signed in to change notification settings - Fork 41
[RHOAIENG-36673] - Investigate and Refactor KServe E2E Test Suite for… #987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RHOAIENG-36673] - Investigate and Refactor KServe E2E Test Suite for… #987
Conversation
WalkthroughChanged default InferenceService deployment mode to RawDeployment, added networking-visibility constants and labels across tests, switched some tests from Knative service reads to Kubernetes Deployment reads, introduced env-driven InferenceService endpoint selection, and updated OpenShift CI gating and cert checks. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Suite
participant Utils as test/e2e/common/utils.py
participant Env as Environment
participant K8s as Kubernetes API
Note over Test,Utils: Resolve InferenceService endpoint
Test->>Utils: get_isvc_endpoint(isvc_name, network_layer)
Utils->>Env: read CI_USE_ISVC_HOST
alt CI_USE_ISVC_HOST == "1"
Utils->>Env: read external host env vars
Utils-->>Test: return external host endpoint
else network_layer == "istio" or "istio-ingress"
Utils->>K8s: get_cluster_ip()
K8s-->>Utils: cluster IP
Utils-->>Test: return internal cluster IP endpoint
else
Utils->>Utils: fallback logic (existing)
Utils-->>Test: return fallback endpoint
end
sequenceDiagram
participant Test as Test Suite
participant K8s as Kubernetes API
Note over Test,K8s: Deployment readiness (RawDeployment)
Test->>K8s: read_namespaced_deployment(service_name, namespace)
K8s-->>Test: V1Deployment
Test->>Test: wait up to 60s for status.ready_replicas > 0
alt ready
Test->>Test: extract containers & probes (container.liveness_probe, container.readiness_probe)
else timeout
Test->>Test: log "Timeout waiting for deployment to be ready"
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (11)
test/scripts/openshift-ci/run-e2e-tests.sh (1)
63-67: Strengthen CA bundle check to ensure non-empty fileUsing
-fonly verifies that/tmp/ca.crtexists; it can still be empty if extraction failed earlier. Using-smakes the guard more robust.-# Use certify go module to get the CA certs -if [ ! -f "/tmp/ca.crt" ]; then +# Use certify go module to get the CA certs +if [ ! -s "/tmp/ca.crt" ]; then echo "❌ Error: CA certificate file '/tmp/ca.crt' not found. Please ensure the setup script ran successfully." exit 1 fitest/e2e/predictor/test_lightgbm.py (1)
57-61: Networking visibility labels for LightGBM services look correctAdding
constants.KSERVE_LABEL_NETWORKING_VISIBILITY: constants.KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDto all LightGBM InferenceServices aligns them with the RawDeployment exposure model and keeps behavior consistent across tests.If you touch this again, you could consider a small helper for the common
V1ObjectMeta(name, namespace, labels=...)pattern to reduce duplication across predictor tests, but it’s not required for this PR.Also applies to: 100-104, 159-163, 216-220
config/overlays/odh/inferenceservice-config-patch.yaml (1)
80-80: Default deployment mode switch toRawDeploymentmatches the new test strategyUpdating
"defaultDeploymentMode"to"RawDeployment"in the ODH overlay is coherent with the rest of this PR (tests and setup moving away from Knative). Just be aware this affects any consumers of this overlay beyond CI; if they still need serverless defaults, they’ll now have to override explicitly.test/e2e/predictor/test_canary.py (1)
36-36: Consider conditional skip instead of permanently disabling canary testsUnconditionally skipping both canary tests will disable them even in environments that do have Knative Serving. If these tests are still valuable upstream, consider a
skipifbased on deployment mode or a dedicated pytest marker so they can still run in serverless setups.Please confirm whether these tests are intended to be disabled globally or only for RawDeployment; if the latter, wiring them to your existing serverless/raw gating would be safer.
Also applies to: 101-101
test/e2e/predictor/test_sklearn.py (2)
325-325: Use ofprotocol_version="grpc-v2"matches the GRPC v2 enum valueExplicitly setting
protocol_version="grpc-v2"on the GRPC sklearn runtime looks correct and aligns withPredictorProtocol.GRPC_V2. As a minor improvement, you could consider referencingconstants.PredictorProtocol.GRPC_V2.valueto avoid string drift.Please verify against your deployed runtime/docs that
"grpc-v2"is still the expected protocol identifier.
446-447: Optional: align mixed GRPC test metadata with new visibility labeling
test_sklearn_v2_mixed_grpcremains skipped but its InferenceService metadata still lacks the new networking visibility label, unlike the other sklearn tests. For future readiness and consistency, you may want to add the same label here as well.If you plan to re-enable this GRPC test later, consider updating its metadata now to avoid RawDeployment routing issues when that happens.
test/e2e/common/utils.py (1)
311-321: CI_USE_ISVC_HOST override is straightforward and preserves existing behaviorUsing
CI_USE_ISVC_HOST == "1"to short-circuit to the external host while keeping thenetwork_layer-based paths unchanged for istio/gateway modes is clear and low risk. The additional logging around env, host, and chosen cluster_ip should help a lot when debugging CI routing issues.
If you foresee non-"1"truthy values (e.g."true"), you might later generalize the check, but it’s not required now.test/e2e/predictor/test_triton.py (2)
71-76: Label-based service discovery for debugging is a good improvementSwitching the failure diagnostics to
list_namespaced_servicewithserving.kserve.io/inferenceservice=<service_name>and iterating services is more robust than relying on a single hard-coded Service name, especially across modes.
You might eventually replaceprint(svc)with logger calls for consistency, but the current change is already a net improvement.Also applies to: 161-167
98-99: Transformer test image/env wiring is clearer; consider using pytest semantics for missing envThe updated skip reason, explicit
IMAGE_TRANSFORMER_IMG_TAGcheck, and wiringtransformer_imageinto the container (with a named 8080/http1 port) make the transformer test configuration much clearer and CI‑friendly.
As a minor refinement, when this test is re‑enabled you might convert the missing env case topytest.skip(...)instead of raisingValueError, so it’s reported as an environment precondition rather than a test failure.Also applies to: 116-122, 127-130
test/e2e/predictor/test_multi_container_probing.py (2)
133-135: RawDeployment readiness and probe verification logic looks solid
- Adding the networking visibility label to the InferenceService metadata aligns this test with the rest of the suite.
- Using
AppsV1Api+TimeoutSamplerto wait untildeployment.status.ready_replicas > 0before reading the final Deployment and inspectingspec.template.spec.containersis a good adaptation for RawDeployment.- Selecting
kserve-containerandkserve-agentbycontainer.nameand asserting non‑None liveness/readiness probes on both containers matches the intent of the test.You could reuse the
deploymentfrom the sampler instead of doing a secondget_deploymentcall, but that’s a micro‑optimization and not necessary.Also applies to: 145-176
179-179: Preferlogger.exceptionhere to capture the stack traceIn the timeout handler, switching from
logger.error("Timeout waiting for deployment to be ready")tologger.exception(...)would automatically include the traceback for theTimeoutExpiredError, which can be very helpful when debugging CI flakiness, especially since you immediately re‑raise.- except TimeoutExpiredError as e: - logger.error("Timeout waiting for deployment to be ready") + except TimeoutExpiredError as e: + logger.exception("Timeout waiting for deployment to be ready") raise e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
config/overlays/odh/inferenceservice-config-patch.yaml(1 hunks)config/overlays/test/dsc.yaml(1 hunks)python/kserve/kserve/constants/constants.py(1 hunks)test/e2e/batcher/test_batcher.py(1 hunks)test/e2e/batcher/test_batcher_custom_port.py(1 hunks)test/e2e/common/utils.py(2 hunks)test/e2e/logger/test_logger.py(3 hunks)test/e2e/predictor/test_autoscaling.py(5 hunks)test/e2e/predictor/test_canary.py(4 hunks)test/e2e/predictor/test_lightgbm.py(4 hunks)test/e2e/predictor/test_multi_container_probing.py(3 hunks)test/e2e/predictor/test_paddle.py(3 hunks)test/e2e/predictor/test_pmml.py(3 hunks)test/e2e/predictor/test_sklearn.py(8 hunks)test/e2e/predictor/test_tensorflow.py(2 hunks)test/e2e/predictor/test_triton.py(7 hunks)test/e2e/predictor/test_xgboost.py(7 hunks)test/e2e/storagespec/test_s3_storagespec.py(1 hunks)test/scripts/openshift-ci/deploy.serverless.sh(1 hunks)test/scripts/openshift-ci/run-e2e-tests.sh(1 hunks)test/scripts/openshift-ci/setup-e2e-tests.sh(7 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
test/e2e/predictor/test_paddle.py (1)
python/kserve/kserve/models/v1beta1_inference_service.py (2)
metadata(136-143)metadata(146-154)
test/e2e/common/utils.py (1)
test/e2e/conftest.py (1)
network_layer(80-81)
test/e2e/predictor/test_triton.py (1)
python/kserve/kserve/api/kserve_client.py (1)
get(166-223)
🪛 Ruff (0.14.5)
test/e2e/predictor/test_multi_container_probing.py
179-179: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: test
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
🔇 Additional comments (18)
config/overlays/test/dsc.yaml (1)
22-23: Adding NIM component withmanagementState: Removedaligns with test simplification.Disabling the NIM component is consistent with the PR objective to focus e2e tests on RawDeployment mode only, reducing deployment complexity.
test/scripts/openshift-ci/deploy.serverless.sh (1)
175-181: Ingress TLS secret selection logic is more robustPreferring
default-ingress-certwhen available and cleanly falling back torouter-certs-defaultmakes this script safer across different OpenShift configurations. The control flow underset -eulooks correct.test/scripts/openshift-ci/setup-e2e-tests.sh (1)
28-40: Serverless gating + apply semantics are cleaner and align with RawDeployment focus
MARKERS_SKIP_SERVERLESSplusskip_serverless()centralize the “raw/graph/predictor don’t need serverless” logic and remove the scattered regex checks. The regex match preserves the old substring semantics while making future extensions trivial.- Using
skip_serverlessconsistently for:
- deciding whether to install OSSM/Serverless,
- choosing raw vs serverless
inferenceservice-configpatches,- installing authorino/gateways,
- wiring the test namespace into ServiceMesh,
- and adding the Knative passthrough annotation
keeps the flow much easier to reason about.- Switching the main KServe install to
oc apply --server-side=true --force-conflicts -f -and adding the explicitodh-model-controllerreadiness log/oc waitimprove robustness for CI runs where resources may already exist or be reconciling.This all looks consistent with the move to RawDeployment-by-default while still keeping serverless paths available where explicitly desired.
Also applies to: 110-116, 129-129, 136-146, 152-158, 164-165, 227-239, 251-253
test/e2e/predictor/test_paddle.py (1)
55-61: Paddle InferenceServices correctly use networking visibility labelsThe added
labelsblocks usingconstants.KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDon the three Paddle services are consistent with the rest of the suite and should ensure they remain reachable under RawDeployment.Also applies to: 109-115, 164-170
test/e2e/storagespec/test_s3_storagespec.py (1)
56-60: S3 storagespec test now aligned with visibility labelingAdding
constants.KSERVE_LABEL_NETWORKING_VISIBILITY: constants.KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDto this InferenceService metadata keeps its exposure model consistent with other RawDeployment tests.test/e2e/predictor/test_autoscaling.py (1)
53-54: Knative autoscaling tests are clearly marked as skippedMarking the Knative-specific autoscaling tests as skipped with explicit reasons clarifies that coverage has intentionally shifted away from Knative, while keeping the test code around for potential future reuse. The small metadata formatting tweaks are neutral.
Also applies to: 72-73, 99-100, 119-120, 144-144
test/e2e/predictor/test_tensorflow.py (1)
51-55: Networking visibility labels look consistent with the new constantsBoth TensorFlow InferenceServices correctly apply
KSERVE_LABEL_NETWORKING_VISIBILITY: KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDand should behave consistently with other updated tests.Please confirm the label is honored by the RawDeployment routing setup in your cluster when you run the e2e suite.
Also applies to: 97-101
test/e2e/logger/test_logger.py (2)
56-62: Exposed visibility labels on logger and sink services are consistentBoth the logger ISVC and the message-dumper ISVC now carry the exposed networking visibility label, which aligns with the rest of the RawDeployment-focused changes.
Please verify during e2e runs that the labeled services are reachable as expected under your RawDeployment networking configuration.
Also applies to: 89-93
74-74: Confirm-predictorservice DNS name in RawDeploymentPointing the logger to
http://{msg_dumper}-predictor.<ns>.svc.cluster.localmakes sense if RawDeployment exposes a*-predictorService; just ensure this matches the actual Service name generated by the controller in your environment.If possible, confirm via
kubectl get svc -n <ns>that the sink service is indeed named{msg_dumper}-predictor.test/e2e/batcher/test_batcher_custom_port.py (1)
59-63: Batcher custom-port service correctly marked as exposedThe added networking visibility label on this custom-port batcher service is consistent with other tests and should help RawDeployment routing.
Please confirm the test passes in a RawDeployment cluster and that the service is reachable on the custom port through your configured ingress path.
test/e2e/predictor/test_canary.py (1)
56-58: Metadata formatting change is benignSplitting
nameandnamespaceover separate lines inV1ObjectMetais purely stylistic and keeps semantics unchanged.No action needed beyond ensuring the tests remain discoverable under your updated test selection logic.
Also applies to: 124-126
test/e2e/predictor/test_xgboost.py (1)
56-60: XGBoost tests consistently apply exposed networking labelEach XGBoost InferenceService now uses the shared
KSERVE_LABEL_NETWORKING_VISIBILITY/KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDconstants, which keeps the tests aligned with RawDeployment exposure requirements.Please validate that all these XGBoost tests can reach their services successfully in a RawDeployment cluster with your current ingress/gateway setup.
Also applies to: 105-109, 160-164, 208-212, 260-264, 309-313, 362-366
python/kserve/kserve/constants/constants.py (1)
40-42: New networking visibility constants are straightforward; just confirm they match controller usageDefining
KSERVE_LABEL_NETWORKING_VISIBILITYandKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDhere centralizes the label semantics nicely and lets tests avoid hard-coded strings.Please double-check that the KServe/ODH controllers and manifests are also using
networking.kserve.io/visibility=exposedso tests and controllers stay in sync.test/e2e/predictor/test_sklearn.py (1)
62-66: Sklearn InferenceServices correctly tagged with networking visibilityAll updated sklearn tests now label their InferenceServices as exposed via the new constants, matching the RawDeployment pattern across the suite.
Please confirm end-to-end that these services are reachable in your RawDeployment environment under the paths your test utilities expect.
Also applies to: 108-112, 157-161, 228-232, 277-281, 344-348, 395-399
test/e2e/batcher/test_batcher.py (1)
55-59: Batcher service exposure label matches the common patternMarking the sklearn batcher InferenceService as exposed via the shared label constants is consistent with the rest of the suite.
Please ensure this test passes under RawDeployment and the batch endpoint is accessible using your configured ingress.
test/e2e/predictor/test_pmml.py (1)
54-58: Networking visibility labels look consistent across PMML testsAdding
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDto the three InferenceServices is consistent with the RawDeployment routing changes and keeps the tests aligned with the rest of the suite. No issues from my side here.Also applies to: 107-111, 157-161
test/e2e/predictor/test_triton.py (1)
53-57: InferenceService labels align with the new networking visibility conventionBoth Triton InferenceServices now include the
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDlabel, which matches the pattern used elsewhere in the suite and should keep routing consistent under RawDeployment. No further changes needed here.Also applies to: 142-147
test/e2e/predictor/test_multi_container_probing.py (1)
49-54: get_deployment helper cleanly encapsulates the RawDeployment naming conventionCentralizing the
service_name + "-predictor"lookup (scoped toKSERVE_TEST_NAMESPACE) inget_deploymentkeeps the main test body readable and makes it easy to adjust if the naming scheme ever changes. No issues here.
6c67f00 to
9412a0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/scripts/openshift-ci/setup-e2e-tests.sh (1)
90-90: Use centralized helper for consistency.Line 90 still uses inline regex check
if [[ "$1" =~ raw ]]; theninstead of the newskip_serverless()helper. This inconsistency could lead to maintenance issues if deployment type patterns need to change later. Refactor to use the centralized function for consistency.Apply this diff to use the helper:
-# Add CA certificate extraction for raw deployments -if [[ "$1" =~ raw ]]; then +# Add CA certificate extraction for raw deployments +if skip_serverless "$1"; then
🧹 Nitpick comments (5)
test/scripts/openshift-ci/setup-e2e-tests.sh (1)
28-40: Solid refactor: Centralized serverless-skipping logic.The new
skip_serverless()helper andMARKERS_SKIP_SERVERLESSarray reduce duplication and make future deployment type changes easier to maintain. Note: Line 35 uses regex pattern matching (=~), which works for simple strings but could be fragile if patterns contain regex metacharacters. Consider using exact string matching (==) instead if the intent is to match literal deployment type names.test/e2e/predictor/test_lightgbm.py (1)
57-62: Networking visibility label looks correct; consider deduplicating pattern.Using
constants.KSERVE_LABEL_NETWORKING_VISIBILITYwithKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDinmetadata.labelsis a good, explicit way to mark this InferenceService as externally exposed, and it aligns with the rest of the PR.Since the same label dict is repeated in other tests in this file, you could optionally factor it into a small helper (or a shared
DEFAULT_LABELS_EXPOSEDconstant) to avoid drift if the label key/value changes again. Not blocking.test/e2e/predictor/test_canary.py (1)
36-36: Canary tests permanently skipped in RawDeployment suiteMarking both canary tests as skipped with a clear reason is consistent with RawDeployment mode lacking Knative-based traffic splitting and avoids noisy failures in this job. Longer term, you may want to gate these via a marker or config (e.g.,
@pytest.mark.serverless_onlyplus environment selection) instead of a hard skip so the same tests can still be exercised in a Knative/Serverless CI lane, but that’s an optional follow-up outside this PR’s scope.Also applies to: 101-101
test/e2e/predictor/test_triton.py (1)
116-122: Environment variable guard for transformer image is clear, but consider skip vs failureRequiring
IMAGE_TRANSFORMER_IMG_TAGand raisingValueErrorgives a loud signal when CI is misconfigured. If you expect some environments to run the suite without this var, you might insteadpytest.skipwith this message; otherwise this is fine for enforcing CI configuration.test/e2e/predictor/test_multi_container_probing.py (1)
177-180: Preferlogger.exceptionand consider always cleaning up the InferenceServiceTwo small improvements here:
- To capture the traceback when the
TimeoutExpiredErroris raised, preferlogger.exceptionoverlogger.errorin theexceptblock (this also satisfies the Ruff TRY400 hint):- except TimeoutExpiredError as e: - logger.error("Timeout waiting for deployment to be ready") - raise e + except TimeoutExpiredError as e: + logger.exception("Timeout waiting for deployment to be ready") + raise e
- Optional: if you want to ensure the
InferenceServiceis deleted even when the deployment never becomes ready, you can move thekserve_client.deletecall into afinallyblock:- try: - for deployment in TimeoutSampler( - wait_timeout=60, - sleep=2, - func=lambda: get_deployment(k8s_client, service_name), - ): - # Wait for Deployment to be ready - if deployment.status.ready_replicas and deployment.status.ready_replicas > 0: - break - - # Get latest deployment state after ready condition is met - ready_deployment = get_deployment(k8s_client, service_name) - containers = ready_deployment.spec.template.spec.containers - ... - - kserve_client.delete(service_name, KSERVE_TEST_NAMESPACE) - except TimeoutExpiredError as e: - logger.error("Timeout waiting for deployment to be ready") - raise e + try: + for deployment in TimeoutSampler( + wait_timeout=60, + sleep=2, + func=lambda: get_deployment(k8s_client, service_name), + ): + # Wait for Deployment to be ready + if deployment.status.ready_replicas and deployment.status.ready_replicas > 0: + break + + # Get latest deployment state after ready condition is met + ready_deployment = get_deployment(k8s_client, service_name) + containers = ready_deployment.spec.template.spec.containers + ... + except TimeoutExpiredError as e: + logger.exception("Timeout waiting for deployment to be ready") + raise e + finally: + kserve_client.delete(service_name, KSERVE_TEST_NAMESPACE)(Only apply the
finallychange if you’re comfortable always cleaning up the resource on failure; if you rely on leaked resources for post‑mortem debugging, you may want to keep the current behavior.)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
config/overlays/odh/inferenceservice-config-patch.yaml(1 hunks)config/overlays/test/dsc.yaml(1 hunks)python/kserve/kserve/constants/constants.py(1 hunks)test/e2e/batcher/test_batcher.py(1 hunks)test/e2e/batcher/test_batcher_custom_port.py(1 hunks)test/e2e/common/utils.py(2 hunks)test/e2e/logger/test_logger.py(3 hunks)test/e2e/predictor/test_autoscaling.py(5 hunks)test/e2e/predictor/test_canary.py(4 hunks)test/e2e/predictor/test_lightgbm.py(4 hunks)test/e2e/predictor/test_multi_container_probing.py(3 hunks)test/e2e/predictor/test_paddle.py(3 hunks)test/e2e/predictor/test_pmml.py(3 hunks)test/e2e/predictor/test_sklearn.py(8 hunks)test/e2e/predictor/test_tensorflow.py(2 hunks)test/e2e/predictor/test_triton.py(7 hunks)test/e2e/predictor/test_xgboost.py(7 hunks)test/e2e/storagespec/test_s3_storagespec.py(1 hunks)test/e2e/storagespec/test_s3_tls_storagespec.py(3 hunks)test/scripts/openshift-ci/deploy.serverless.sh(1 hunks)test/scripts/openshift-ci/run-e2e-tests.sh(1 hunks)test/scripts/openshift-ci/setup-e2e-tests.sh(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
- test/scripts/openshift-ci/deploy.serverless.sh
- test/e2e/predictor/test_xgboost.py
- test/e2e/logger/test_logger.py
- config/overlays/odh/inferenceservice-config-patch.yaml
- test/e2e/predictor/test_pmml.py
- test/e2e/common/utils.py
- config/overlays/test/dsc.yaml
- test/scripts/openshift-ci/run-e2e-tests.sh
- python/kserve/kserve/constants/constants.py
- test/e2e/storagespec/test_s3_storagespec.py
- test/e2e/predictor/test_autoscaling.py
- test/e2e/predictor/test_tensorflow.py
- test/e2e/predictor/test_paddle.py
🧰 Additional context used
🧬 Code graph analysis (1)
test/e2e/predictor/test_sklearn.py (2)
python/kserve/kserve/protocol/infer_type.py (3)
name(147-153)name(420-427)name(948-954)python/kserve/kserve/models/v1beta1_explainer_spec.py (2)
labels(857-865)labels(868-877)
🪛 Ruff (0.14.5)
test/e2e/predictor/test_multi_container_probing.py
179-179: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test
- GitHub Check: Test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: build (3.12)
- GitHub Check: test
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
- GitHub Check: build (3.9)
🔇 Additional comments (31)
test/scripts/openshift-ci/setup-e2e-tests.sh (4)
89-108: Good: Defensive CA certificate extraction with validation.The CA bundle extraction includes proper validation (lines 103-107) and continues gracefully if extraction fails. This supports raw deployments without blocking the setup. The use of
|| truecombined with downstream validation is a solid pattern here.
129-129: Appropriate: Server-side apply with conflict handling.The addition of
--server-side=true --force-conflictsflags tooc applyis appropriate for managing conflicts and reducing server load in CI environments.
111-111: Consistent use of new helper across script.All conditional blocks that check for serverless deployment modes now consistently use the centralized
skip_serverless()function, improving maintainability.Also applies to: 136-136, 152-152, 227-227, 251-251
164-164: Improved observability: Added informative log message.The new echo statement provides users with context on what the script is waiting for and an expected timeout, improving the debugging experience for CI runs.
test/e2e/predictor/test_lightgbm.py (3)
100-105: Consistent use of networking visibility label for runtime-based LightGBM test.Same pattern as above; label addition is correct and consistent with the RawDeployment-focused changes.
159-164: Label addition for v2 MLServer runtime is consistent and correct.
metadata.labelsmirrors the earlier tests and should ensure this v2 MLServer-based service is exposed in the same way.
216-221: Label addition for v2 KServe LightGBM test is correct.This keeps networking visibility behavior aligned across all HTTP-based LightGBM tests in this file.
test/e2e/predictor/test_canary.py (1)
56-58: Metadata formatting change is non-functionalThe
client.V1ObjectMetainitializations still pass the samenameandnamespacekeyword arguments; splitting them across lines is purely stylistic and has no behavioral impact.Also applies to: 124-126
test/e2e/storagespec/test_s3_tls_storagespec.py (3)
115-121: Networking visibility label for global-custom-cert positive case looks correctAdding
labelsonV1ObjectMetawithKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDis consistent with exposing this InferenceService in RawDeployment and should not affect existing behavior.
196-202: Label addition for custom-cert positive/negative test pair is consistentSame networking-visibility label pattern as other tests; matches the RawDeployment exposure model and keeps reuse of the
isvcobject safe.
261-267: Serving-cert storagespec test correctly adopts networking visibility labelMetadata update to include the visibility label mirrors the other TLS tests and should help keep routing behavior consistent in RawDeployment.
test/e2e/batcher/test_batcher.py (1)
55-59: Batcher InferenceService now correctly labeled for networking visibilityApplying
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDat metadata level aligns this test with the RawDeployment model and the rest of the suite.test/e2e/batcher/test_batcher_custom_port.py (1)
59-63: Custom-port batcher test adopts networking visibility label correctlyLabeling this InferenceService as exposed keeps behavior consistent with other RawDeployment tests; no issues with the custom port configuration.
test/e2e/predictor/test_sklearn.py (8)
62-66: sklearn HTTP test metadata labeling is consistent and safeAdding the networking visibility label here matches the broader test suite changes and should help expose the service correctly in RawDeployment.
108-112: v2 HTTP (mlserver) test picks up the correct visibility labelThe metadata label for networking visibility is applied in the same way as other tests; no issues spotted.
157-161: Runtime-based sklearn test correctly labeled for exposureThe label addition on
test_sklearn_runtime_kservekeeps runtime-backed services aligned with the RawDeployment exposure pattern.
228-232: v2 runtime (mlserver) HTTP test uses the shared visibility labelConsistent use of
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDon metadata here is appropriate and matches the other v2 tests.
277-281: v2 sklearn HTTP test adopts networking visibility label as expectedLabeling this InferenceService as exposed is in line with the rest of the suite and should help keep routing behavior uniform.
325-325: Use ofprotocol_version="grpc-v2"in ModelSpecAssuming the deployed sklearn server supports the
"grpc-v2"protocol identifier, this is the right place to declare it and should make the gRPC v2 test configuration clearer.Please double-check against the KServe / runtime docs that
"grpc-v2"is the expectedprotocol_versionstring for this model server.
344-348: gRPC v2 sklearn test metadata labeled for networking visibilityEven though this test is currently skipped, adding the label now ensures it will behave consistently once re-enabled.
394-398: Mixed-type v2 sklearn HTTP test correctly adds networking visibility labelThis brings the mixed-type test in line with the other sklearn predictors from a routing/visibility standpoint.
test/e2e/predictor/test_triton.py (6)
53-57: Base Triton test metadata now labeled for networking visibilityApplying the visibility label here is consistent with other predictors and should help RawDeployment routing without side effects.
71-76: Switch to listing Services by label improves failure diagnostics in RawDeploymentUsing
list_namespaced_servicewith theserving.kserve.io/inferenceservicelabel is a good fit for RawDeployment and avoids assuming a specific Service name; the loop printing each Service is appropriate for debugging.
98-98: Skip reason updated with concrete reactivation conditionTying the skip reason to a specific PR makes it clearer when this test should be revisited and re-enabled.
127-130: Transformer container now parameterized by env-provided image and explicit HTTP portUsing
transformer_imagefrom the environment and adding an explicit8080HTTP port (name="http1") makes the transformer configuration more flexible and clearer; no issues spotted.
142-146: Transformer Triton test metadata labeled for networking visibilityThe added label matches the pattern used elsewhere and should ensure this composite InferenceService is discoverable in RawDeployment.
161-167: Failure logging updated to list Services by label for transformer testMirroring the base Triton test, listing Services via the inferenceservice label improves diagnostics without depending on Knative-specific resources.
test/e2e/predictor/test_multi_container_probing.py (4)
49-54: Deployment helper matches RawDeployment usageThe
get_deploymenthelper correctly targets the predictor Deployment viaAppsV1Api.read_namespaced_deploymentinKSERVE_TEST_NAMESPACE, which is consistent with the RawDeployment test setup.
133-135: Networking visibility label aligns with exposed RawDeployment behaviorAdding
constants.KSERVE_LABEL_NETWORKING_VISIBILITY: constants.KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDon the InferenceService metadata is appropriate for ensuring the service is exposed in RawDeployment mode and matches the broader PR direction.
145-160: Deployment readiness wait viaready_replicasis reasonableUsing
TimeoutSamplerwith a 60s timeout and checkingdeployment.status.ready_replicas > 0is a sensible adaptation from Knative readiness to Deployment readiness for this single‑replica test. Re‑fetching the Deployment intoready_deploymentbefore inspecting containers is also a clean pattern.
162-166: Container selection and probe assertions look correctSelecting
kserve-containerandkserve-agentfromready_deployment.spec.template.spec.containersand asserting non‑Noneliveness_probeandreadiness_probeon both containers directly validates that the probes are attached as intended.Also applies to: 168-175
9412a0e to
dc2074c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
config/overlays/test/dsc.yaml (1)
20-20: Configuration inconsistency persists: verify RawDeployment mode is set.The past review flagged that
defaultDeploymentMode: Serverless(line 20) is incompatible withserving.managementState: Removed(line 28), as Serverless mode requires Knative Serving to be enabled. The comment indicates this was addressed in earlier commits, but the code still showsServerless. Verify that the correct configuration is applied or that an overlay/patch properly setsdefaultDeploymentModetoRawDeployment.Run this script to verify the effective configuration:
#!/bin/bash # Check if RawDeployment is set in any overlay or patch files echo "=== Searching for defaultDeploymentMode settings ===" rg -n "defaultDeploymentMode" config/overlays/test/ -A 1 -B 1 echo -e "\n=== Checking for kserve config patches ===" fd -e yaml -e yml . config/overlays/test/ --exec cat {} \; --exec echo "---" \;
🧹 Nitpick comments (4)
test/e2e/predictor/test_multi_container_probing.py (1)
179-179: Optional: uselogger.exceptionfor exception logging.Static analysis suggests using
logger.exceptioninstead oflogger.errorwithin exception handlers, as it automatically includes traceback information.Apply this diff:
- logger.error("Timeout waiting for deployment to be ready") + logger.exception("Timeout waiting for deployment to be ready")test/e2e/common/utils.py (1)
311-321: Env-basedCI_USE_ISVC_HOSToverride and logging look reasonable.The new logic cleanly lets CI force use of the external host from
status.url, with clear logging of the env var, host, and network layer, and sensible fallbacks foristio/istio-ingress. For parity, you might optionally add similar “Using … cluster IP” logs for theenvoy-gatewayapiandistio-gatewayapibranches or document thatCI_USE_ISVC_HOSTtakes precedence overnetwork_layerin all modes.test/e2e/storagespec/test_s3_tls_storagespec.py (1)
73-76: Temporary skips for TLS S3 tests are clearly annotated.Marking these TLS S3 tests as skipped with a direct reference to
RHOAIENG-39707is a reasonable way to unblock RawDeployment work. Just ensure that ticket is tracking re‑enabling them so TLS coverage doesn’t remain disabled indefinitely.Also applies to: 155-158, 237-240
test/e2e/predictor/test_sklearn.py (1)
442-449: Preemptively aligningtest_sklearn_v2_mixed_grpcmetadata with new labeling.Even though this gRPC mixed test remains skipped, its InferenceService metadata still lacks the networking-visibility label that other tests now set. For future re‑enablement, you might consider adding the same label here to avoid surprises later.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
config/overlays/odh/inferenceservice-config-patch.yaml(1 hunks)config/overlays/test/dsc.yaml(1 hunks)python/kserve/kserve/constants/constants.py(1 hunks)test/e2e/batcher/test_batcher.py(1 hunks)test/e2e/batcher/test_batcher_custom_port.py(1 hunks)test/e2e/common/utils.py(2 hunks)test/e2e/logger/test_logger.py(3 hunks)test/e2e/predictor/test_autoscaling.py(5 hunks)test/e2e/predictor/test_canary.py(4 hunks)test/e2e/predictor/test_lightgbm.py(4 hunks)test/e2e/predictor/test_mlflow.py(1 hunks)test/e2e/predictor/test_multi_container_probing.py(3 hunks)test/e2e/predictor/test_paddle.py(3 hunks)test/e2e/predictor/test_pmml.py(3 hunks)test/e2e/predictor/test_sklearn.py(8 hunks)test/e2e/predictor/test_tensorflow.py(2 hunks)test/e2e/predictor/test_triton.py(7 hunks)test/e2e/predictor/test_xgboost.py(7 hunks)test/e2e/storagespec/test_s3_storagespec.py(1 hunks)test/e2e/storagespec/test_s3_tls_storagespec.py(6 hunks)test/scripts/openshift-ci/deploy.serverless.sh(1 hunks)test/scripts/openshift-ci/run-e2e-tests.sh(2 hunks)test/scripts/openshift-ci/setup-e2e-tests.sh(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- test/scripts/openshift-ci/deploy.serverless.sh
- test/scripts/openshift-ci/setup-e2e-tests.sh
- test/e2e/predictor/test_autoscaling.py
- test/e2e/predictor/test_pmml.py
- test/e2e/batcher/test_batcher.py
- test/scripts/openshift-ci/run-e2e-tests.sh
- test/e2e/predictor/test_lightgbm.py
- test/e2e/predictor/test_tensorflow.py
- config/overlays/odh/inferenceservice-config-patch.yaml
- test/e2e/predictor/test_canary.py
🧰 Additional context used
🧬 Code graph analysis (3)
test/e2e/predictor/test_paddle.py (1)
python/kserve/kserve/models/v1beta1_inference_service.py (2)
metadata(136-143)metadata(146-154)
test/e2e/common/utils.py (1)
test/e2e/conftest.py (1)
network_layer(80-81)
test/e2e/predictor/test_triton.py (1)
python/kserve/kserve/api/kserve_client.py (1)
get(166-223)
🪛 Ruff (0.14.5)
test/e2e/predictor/test_multi_container_probing.py
179-179: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build (3.11)
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: Test
- GitHub Check: build (3.12)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: precommit-check
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (21)
test/e2e/predictor/test_xgboost.py (1)
56-60: LGTM — consistent networking visibility labeling.All InferenceService instances now include the networking visibility label, properly using the constants defined in
constants.py. The formatting is consistent across all test functions.Also applies to: 105-109, 160-164, 208-212, 260-264, 309-313, 362-366
python/kserve/kserve/constants/constants.py (1)
40-42: LGTM — well-defined networking visibility constants.The new label constants follow KServe naming conventions and provide a clean way to manage networking visibility across the test suite.
test/e2e/predictor/test_mlflow.py (1)
64-68: LGTM — consistent with test suite pattern.Networking visibility label properly added to InferenceService metadata.
test/e2e/predictor/test_paddle.py (2)
55-61: LGTM — networking visibility labels added consistently.Three Paddle test scenarios properly include the networking visibility label in InferenceService metadata.
Also applies to: 109-115, 164-170
227-227: I'm experiencing technical difficulties accessing the repository to verify your concern. The repository cloning is failing, which prevents me from:
- Examining the full
test_paddle_v2_grpcfunction and its metadata configuration- Comparing with
test_xgboost_v2_grpcand other skipped GRPC tests in similar files- Confirming whether
KSERVE_LABEL_NETWORKING_VISIBILITYlabels are present or consistently applied- Determining if the omission is intentional or an oversight
**Skipped test missing networking visibility labels. The
test_paddle_v2_grpcfunction at line 227 doesn't include the networking visibility labels, unlike other tests in this file. Other skipped tests in the PR (e.g.,test_xgboost_v2_grpcin test_xgboost.py) still have the labels added. Is this omission intentional?test/e2e/predictor/test_triton.py (3)
53-57: LGTM — networking visibility labels properly added.Both Triton test scenarios now include the networking visibility label in InferenceService metadata.
Also applies to: 142-146
116-121: LGTM — defensive environment variable validation.The pre-flight check for
IMAGE_TRANSFORMER_IMG_TAGprovides clear, actionable error messages if the required environment variable is missing.
71-76: LGTM — error handling adapted for label-based service discovery.Switching from direct service lookups to label-based listing aligns with the test suite's move to RawDeployment mode.
Also applies to: 161-166
test/e2e/predictor/test_multi_container_probing.py (3)
49-54: LGTM — properly refactored for RawDeployment mode.The function correctly switches from reading Knative Services (CustomObjectsApi) to reading Kubernetes Deployments (AppsV1Api), aligning with the PR objective.
148-175: LGTM — Deployment readiness check and probe verification updated correctly.The wait logic properly checks
deployment.status.ready_replicas, and container/probe access now uses object attributes (e.g.,container.name,container.liveness_probe) instead of dictionary indexing, matching the Kubernetes Python client API.
133-135: LGTM — networking visibility label added.The InferenceService metadata now includes the networking visibility label, consistent with other tests.
test/e2e/batcher/test_batcher_custom_port.py (1)
59-63: LGTM — consistent labeling.Networking visibility label properly added to the batcher test InferenceService metadata.
test/e2e/storagespec/test_s3_storagespec.py (1)
56-60: Networking visibility label on S3 storagespec ISVC is consistent.Setting
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDon the InferenceService metadata aligns with the rest of the suite and should help ensure consistent routing in RawDeployment mode.test/e2e/common/utils.py (1)
271-277: Minor whitespace tweak inpredict_grpconly.This hunk only adds spacing around the
model_namecheck andgrpc_clientcall; it has no behavioral impact.test/e2e/logger/test_logger.py (2)
56-62: Networking visibility labels on logger ISVCs are aligned with the suite.Applying
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDto both the message-dumper and logger InferenceServices matches the new pattern elsewhere and should keep routing behavior consistent under RawDeployment.Also applies to: 89-93
72-75: Please confirm logger sink URL matches the actual RawDeployment Service name.Switching the logger URL to
http://{msg_dumper}-predictor.{namespace}.svc.cluster.locallooks like the right direction for RawDeployment, but service naming sometimes includes extra suffixes (e.g.,-default). It’s worth double-checking that this hostname matches the created Service so that log events are not silently dropped.test/e2e/storagespec/test_s3_tls_storagespec.py (2)
117-121: Networking visibility labels on TLS S3 ISVCs are consistent.Adding
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDto the TLS S3 InferenceServices brings them in line with other tests and should help keep routing behavior uniform.Also applies to: 199-203, 265-269
283-287: Secret patch for invalid TLS data connection is correctly base64-encoded.Using
b64encode(invalid_data_connection.encode()).decode()when patching thestorage-configSecret matches Kubernetes’ expectation thatdatavalues are base64 strings; this looks correct.test/e2e/predictor/test_sklearn.py (3)
62-66: Consistent networking visibility labeling across sklearn tests.All updated sklearn InferenceServices now set
KSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDinmetadata.labels, which is consistent with the rest of the e2e suite and should help keep endpoint exposure behavior uniform in RawDeployment mode.Also applies to: 108-112, 157-161, 228-232, 277-281, 344-348, 394-398
323-337: gRPC v2protocol_versionflag intest_sklearn_v2_grpc.Setting
protocol_version="grpc-v2"onV1beta1ModelSpecmatches the intent of this test name and payload. Since this test is still skipped, impact is low, but please verify that"grpc-v2"is exactly the value expected by the serving runtime for gRPC v2.
371-417: Unskippingtest_sklearn_v2_mixed—ensure environment is ready.With the skip removed, this mixed‑type sklearn v2 test will now run and assert on a specific numeric output for the
1.3/mixedtypemodel. Make sure the target clusters have that model artifact available and that the RawDeployment networking changes don’t introduce flakiness here.
… RawDeployment mode only
chore: Fix tests.
Adapt the setup and run e2e tests to work with Raw Deployment
Update failing tests to work with Raw Deployment mode
Signed-off-by: Spolti <[email protected]>
dc2074c to
ddcdf3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (6)
test/e2e/storagespec/test_s3_tls_storagespec.py (4)
155-158: Duplicate: same temporary skip pattern as first S3 TLS testThis skip marker follows the same RHOAIENG-39707 pattern as the earlier test; treating it the same way (temporary infra/behavior issue) makes sense.
198-203: Duplicate: same metadata visibility/namespace pattern as first ISVCThis InferenceService metadata uses the same explicit namespace and networking-visibility label as the earlier S3 TLS test, which is consistent and appropriate for RawDeployment.
237-240: Duplicate: third S3 TLS test is also cleanly skipped with the same tracking referenceSame skip approach and rationale as the other two TLS tests; keeping them aligned under the same tracking ticket is good.
264-269: Duplicate: consistent networking-visibility labeling across all S3 TLS ISVCsThis final InferenceService’s metadata matches the visibility/namespace pattern used above, which keeps these tests uniform and easier to reason about.
test/e2e/predictor/test_triton.py (2)
141-146: Duplicate: transformer ISVC metadata uses the same namespace/visibility conventionThe transformer scenario’s InferenceService metadata mirrors the namespace and visibility labeling used in the plain Triton test, which keeps behavior consistent.
160-166: Duplicate: service listing on transformer test failures follows same debug patternThe failure-path logging for services in the transformer test matches the pattern from
test_triton, giving you consistent diagnostics across both tests.
🧹 Nitpick comments (4)
test/e2e/common/utils.py (1)
311-320: Environment-driven endpoint selection looks good.The
CI_USE_ISVC_HOSTenvironment variable provides flexibility for RawDeployment mode tests, and the debug logging will help diagnose CI issues.Minor observation: the
envoy-gatewayapiandistio-gatewayapibranches (lines 321-330) lack the correspondinglogger.info(f"Using internal cluster IP: {cluster_ip}")statement. Consider adding similar logging there for consistency if those paths are exercised in your CI.elif network_layer == "envoy-gatewayapi": cluster_ip = get_cluster_ip( namespace="envoy-gateway-system", labels={"serving.kserve.io/gateway": "kserve-ingress-gateway"}, ) + logger.info(f"Using envoy-gatewayapi cluster IP: {cluster_ip}") elif network_layer == "istio-gatewayapi": cluster_ip = get_cluster_ip( namespace="kserve", labels={"serving.kserve.io/gateway": "kserve-ingress-gateway"}, ) + logger.info(f"Using istio-gatewayapi cluster IP: {cluster_ip}") else: raise ValueError(f"Unknown network layer {network_layer}")test/scripts/openshift-ci/setup-e2e-tests.sh (1)
129-129:--force-conflictsmay mask legitimate resource conflicts.Adding
--force-conflictsallows overwriting fields managed by other controllers. While this can resolve CI flakiness, it may hide configuration drift or conflicts that should be investigated.Consider documenting why this flag is needed (e.g., in a comment) to help future maintainers understand when it's safe to remove.
test/e2e/predictor/test_multi_container_probing.py (1)
178-180: Uselogging.exceptioninstead oflogging.errorto capture the traceback.Per static analysis hint,
logging.exceptionautomatically includes exception info which aids debugging.except TimeoutExpiredError as e: - logger.error("Timeout waiting for deployment to be ready") + logger.exception("Timeout waiting for deployment to be ready") raise etest/e2e/predictor/test_triton.py (1)
70-76: Extra service listing on failure improves debuggabilityListing services by the inferenceservice label and printing them in the
wait_isvc_readyfailure path should make it much easier to debug routing or selector issues when this test flakes. This is a reasonable trade-off since it only runs on failure.If the printed objects become too noisy, consider logging only selected fields (name, labels, ports) instead of the full object, but this is non-blocking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
config/overlays/odh/inferenceservice-config-patch.yaml(1 hunks)config/overlays/test/dsc.yaml(1 hunks)python/kserve/kserve/constants/constants.py(1 hunks)test/e2e/batcher/test_batcher.py(1 hunks)test/e2e/batcher/test_batcher_custom_port.py(1 hunks)test/e2e/common/utils.py(2 hunks)test/e2e/logger/test_logger.py(3 hunks)test/e2e/predictor/test_autoscaling.py(5 hunks)test/e2e/predictor/test_canary.py(4 hunks)test/e2e/predictor/test_lightgbm.py(4 hunks)test/e2e/predictor/test_mlflow.py(1 hunks)test/e2e/predictor/test_multi_container_probing.py(3 hunks)test/e2e/predictor/test_paddle.py(3 hunks)test/e2e/predictor/test_pmml.py(3 hunks)test/e2e/predictor/test_sklearn.py(8 hunks)test/e2e/predictor/test_tensorflow.py(2 hunks)test/e2e/predictor/test_triton.py(7 hunks)test/e2e/predictor/test_xgboost.py(7 hunks)test/e2e/storagespec/test_s3_storagespec.py(1 hunks)test/e2e/storagespec/test_s3_tls_storagespec.py(6 hunks)test/scripts/openshift-ci/deploy.serverless.sh(1 hunks)test/scripts/openshift-ci/run-e2e-tests.sh(2 hunks)test/scripts/openshift-ci/setup-e2e-tests.sh(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (13)
- test/e2e/predictor/test_tensorflow.py
- test/e2e/logger/test_logger.py
- python/kserve/kserve/constants/constants.py
- test/e2e/predictor/test_mlflow.py
- test/scripts/openshift-ci/deploy.serverless.sh
- config/overlays/test/dsc.yaml
- test/e2e/predictor/test_xgboost.py
- test/e2e/batcher/test_batcher_custom_port.py
- test/e2e/batcher/test_batcher.py
- config/overlays/odh/inferenceservice-config-patch.yaml
- test/e2e/predictor/test_sklearn.py
- test/e2e/predictor/test_canary.py
- test/e2e/storagespec/test_s3_storagespec.py
🧰 Additional context used
🧬 Code graph analysis (3)
test/e2e/predictor/test_paddle.py (1)
python/kserve/kserve/models/v1beta1_inference_service.py (2)
metadata(136-143)metadata(146-154)
test/e2e/predictor/test_triton.py (1)
python/kserve/kserve/api/kserve_client.py (1)
get(166-223)
test/e2e/common/utils.py (1)
test/e2e/conftest.py (1)
network_layer(80-81)
🪛 Ruff (0.14.6)
test/e2e/predictor/test_multi_container_probing.py
179-179: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: Test
- GitHub Check: precommit-check
- GitHub Check: test
- GitHub Check: build (3.11)
- GitHub Check: build (3.9)
- GitHub Check: build (3.10)
- GitHub Check: test
- GitHub Check: build (3.12)
- GitHub Check: test
🔇 Additional comments (14)
test/e2e/common/utils.py (1)
275-275: Formatting-only change.No behavioral impact from the blank line addition.
test/e2e/predictor/test_autoscaling.py (1)
53-53: LGTM - Skip markers appropriately disable Knative-dependent tests.The skip markers correctly identify that these tests rely on Knative-specific autoscaling annotations (
autoscaling.knative.dev/target,autoscaling.knative.dev/metric) which are not applicable in RawDeployment mode.Also applies to: 99-99, 144-144
test/e2e/predictor/test_multi_container_probing.py (1)
49-54: LGTM - Function correctly refactored for RawDeployment mode.The transition from
CustomObjectsApi(Knative services) toAppsV1Api(Kubernetes Deployments) aligns with the RawDeployment mode requirements.test/scripts/openshift-ci/run-e2e-tests.sh (1)
64-67: LGTM - Good defensive check for CA certificate.The pre-check ensures the setup script completed successfully before proceeding with tests, providing a clear error message if the CA certificate is missing.
test/e2e/predictor/test_paddle.py (1)
55-61: LGTM - Networking visibility labels consistently applied.The labels are correctly added using the constants
KSERVE_LABEL_NETWORKING_VISIBILITYandKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDacross all active test cases.Also applies to: 109-115, 164-170
test/e2e/predictor/test_lightgbm.py (1)
57-61: LGTM - Networking visibility labels correctly applied to all active tests.The labeling pattern is consistent with other test files in this PR.
Also applies to: 100-104, 159-163, 216-220
test/e2e/predictor/test_pmml.py (1)
54-58: LGTM - Networking visibility labels correctly applied.The pattern follows the established convention from other test files in this PR.
Also applies to: 107-111, 157-161
test/scripts/openshift-ci/setup-e2e-tests.sh (1)
28-40: Based on my investigation, I need to mark this for manual verification since I cannot access the repository directly to confirm the actual test patterns and intent. However, I can provide insight based on the code logic:The original review comment raises a valid concern that requires confirmation from maintainers.
The code logic demonstrates the issue:
- The
skip_serverless()function at lines 28-40 uses regex matching:if [[ "$deployment_type" =~ $type ]]- Including
"predictor"inMARKERS_SKIP_SERVERLESSmeans ANY deployment type containing "predictor" will skip serverless installation- Since pytest markers can be combined (e.g.,
@pytest.mark.predictor @pytest.mark.raw), this regex will match all predictor variantsThe concern is valid because:
- "predictor" is a broad category that subsumes other test types
- The distinction between "raw", "graph", and "predictor" suggests these are separate concerns
- The regex matching approach may inadvertently skip serverless for predictor tests that should require it
However, without access to the actual test files, git history, and test execution requirements, I cannot definitively determine if this is:
- Intentional (all predictor tests should skip serverless)
- A mistake (some predictor tests need serverless)
- Redundant (the "raw", "graph" markers already cover the necessary cases)
The original review comment should skip serverless for all predictor tests, or whether some predictor variants still need serverless mode. Consider verifying against the test specifications or confirming the intended test execution matrix.
test/e2e/storagespec/test_s3_tls_storagespec.py (2)
73-76: Skip marker for broken S3 TLS test is fine and clearly scopedTying the skip reason to
RHOAIENG-39707makes it clear this is temporary and why the test is disabled. No change needed here as long as the ticket tracks re-enabling the test later.
116-121: Explicit namespace + networking-visibility label align this ISVC with RawDeployment routingSetting
namespace=KSERVE_TEST_NAMESPACEand adding theKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDlabel on the InferenceService metadata matches the RawDeployment exposure pattern used elsewhere in the suite and should help ensure the service is reachable in the test namespace.Please re-run this e2e test once RawDeployment wiring is in place to confirm the controller is honoring the visibility label as expected.
test/e2e/predictor/test_triton.py (4)
52-57: InferenceService metadata now correctly pins namespace and exposes networking visibilityAdding
namespace=KSERVE_TEST_NAMESPACEand theKSERVE_LABEL_NETWORKING_VISIBILITY_EXPOSEDlabel directly on the Triton InferenceService is a good fit for RawDeployment and brings this test in line with the rest of the suite.Please confirm in CI runs that the resulting Service/Route wiring behaves as expected with this label (i.e., no regressions in how the Triton endpoint is reached).
95-99: Skip marker for transformer test is clearly tied to external controller PRMarking the transformer test as skipped until the referenced
odh-model-controllerPR is merged is a clear and maintainable way to document why it isn’t running yet.
116-121: Env-var guard for transformer image makes configuration failures explicitChecking
IMAGE_TRANSFORMER_IMG_TAGup front and raising aValueErrorwith a clear message prevents the test from silently deploying an invalid or empty image, which is useful for catching CI misconfigurations.If you’d rather treat a missing image tag as “test not applicable” instead of an infra failure, you could switch this to
pytest.skip(error_msg); otherwise, this hard failure behavior is fine as-is.
127-130: Transformer container now correctly uses env-derived image and exposes HTTP port 8080Wiring
image=transformer_imageand explicitly defining port8080with namehttp1matches the usual RawDeployment assumptions for the user container’s HTTP endpoint and should work well with the routing layer.Once the skip is removed, please verify in a test run that the transformer Pod comes up with the expected image and port configuration (e.g., by inspecting the created Deployment/Pod).
| export GITHUB_SHA=master | ||
|
|
||
| if [ "$BUILD_KSERVE_IMAGES" = "true" ]; then | ||
| echo "asd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug output echo "asd".
This appears to be leftover debug output that should be removed before merging.
if [ "$BUILD_KSERVE_IMAGES" = "true" ]; then
- echo "asd"
pushd $PROJECT_ROOT >/dev/null📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "asd" | |
| if [ "$BUILD_KSERVE_IMAGES" = "true" ]; then | |
| pushd $PROJECT_ROOT >/dev/null |
🤖 Prompt for AI Agents
In test/scripts/openshift-ci/run-e2e-tests.sh around line 40, remove the
leftover debug output echo "asd" from the script; simply delete that line so the
script contains only intended production/test commands and no stray debugging
prints.
|
/test e2e-graph |
Jooho
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jooho, spolti The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f0eaa31
into
opendatahub-io:master
… RawDeployment mode only
chore: Fix tests.
Adapt the setup and run e2e tests to work with Raw Deployment
Update failing tests to work with Raw Deployment mode
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Test B
Logs
Special notes for your reviewer:
Checklist:
Release note:
Re-running failed tests
/rerun-all- rerun all failed workflows./rerun-workflow <workflow name>- rerun a specific failed workflow. Only one workflow name can be specified. Multiple /rerun-workflow commands are allowed per comment.Summary by CodeRabbit
New Features
Bug Fixes
Configuration Changes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.