Skip to content

Conversation

@sutaakar
Copy link
Contributor

@sutaakar sutaakar commented Dec 2, 2025

Description

Trainer v2 controller requires JobSet CRD to be available to start properly.
Adding JobSet CRD check to trainer component preconditions to properly indicate JobSet CRD unavailability in DSC status.

How Has This Been Tested?

Screenshot or short clip

Merge criteria

  • You have read the contributors guide.
  • Commit messages are meaningful - have a clear and concise summary and detailed explanation of what was changed and why.
  • Pull Request contains a description of the solution, a link to the JIRA issue, and to any dependent or related Pull Request.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work
  • The developer has run the integration test pipeline and verified that it passed successfully

E2E test suite update requirement

When bringing new changes to the operator code, such changes are by default required to be accompanied by extending and/or updating the E2E test suite accordingly.

To opt-out of this requirement:

  1. Please inspect the opt-out guidelines, to determine if the nature of the PR changes allows for skipping this requirement
  2. If opt-out is applicable, provide justification in the dedicated E2E update requirement opt-out justification section below
  3. Check the checkbox below:
  • Skip requirement to update E2E test suite for this PR
  1. Submit/save these changes to the PR description. This will automatically trigger the check.

E2E update requirement opt-out justification

Summary by CodeRabbit

  • New Features

    • Initialization now verifies the JobSet CRD exists and returns a clear, actionable message if missing.
  • Tests

    • Added unit tests for both missing and present JobSet CRD scenarios.
    • Added end-to-end validation to ensure required JobSetOperator resources are created.
  • Misc

    • Added a visible status message constant and new resource identifier for JobSet to support checks and tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds a JobSet CRD presence check to the trainer controller's pre-condition validation that returns a StopError with status.JobSetCRDMissingMessage when absent. Introduces JobSetv1alpha2 GVK, a status constant, e2e helpers and a test that validate resource creation, plus unit tests for CRD present/missing.

Changes

Cohort / File(s) Summary
Controller change
internal/controller/components/trainer/trainer_controller_actions.go
Add JobSet CRD existence check in checkPreConditions using cluster.HasCRD; return StopError with formatted message on error or missing CRD.
Unit tests (trainer)
internal/controller/components/trainer/trainer_controller_actions_test.go
Add tests: TestCheckPreConditions_Managed_JobSetCRDNotInstalled (expects JobSetCRDMissingMessage error) and TestCheckPreConditions_Managed_JobSetCRDInstalled (expects no error); configure fake scheme/CRD and operator condition.
Status constant
internal/controller/status/status.go
Add exported constant JobSetCRDMissingMessage with instruction to create JobSetOperator CR when JobSet CRD is missing.
GVK definition
pkg/cluster/gvk/gvk.go
Add exported JobSetv1alpha2 schema.GroupVersionKind for jobset.x-k8s.io/v1alpha2/JobSet.
E2E tests
tests/e2e/creation_test.go
Add ValidateResourcesCreation(t *testing.T) to wait for JobSetOperator Available and add a test case verifying required resources are created.
E2E helper
tests/e2e/helper_test.go
Add CreateJobSetOperator() *unstructured.Unstructured helper constructing a operator.openshift.io/v1 JobSetOperator unstructured object with default spec fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus review on:
    • checkPreConditions error construction and StopError propagation.
    • Unit test setup for fake scheme/CRD and use of gvk.JobSetv1alpha2.
    • Consistency and naming of JobSetCRDMissingMessage.
    • E2E additions: correctness of CreateJobSetOperator shape and the wait logic in ValidateResourcesCreation.

Poem

🐰 I sniffed the cluster, hopped around the sod,
Searched for JobSet under root and clod.
If it's absent, I gently sound the bell—
"Create the CR, then all will be well!"
Tests and helpers join the happy waltz.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a precondition check for the JobSet CRD to the Trainer v2 controller, which aligns with the changeset modifications across multiple files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/controller/components/trainer/trainer_controller_actions.go (1)

22-25: JobSet CRD precondition wiring looks correct; minor error message nit

The control flow (operator check first, then CRD check with cluster.HasCRD and StopError on missing CRD) is sound and matches the PR intent; no functional issues spotted.

The formatted error string is slightly awkward though:

return odherrors.NewStopError("failed to check %s CRDs version: %w", gvk.JobSetv1alpha2, err)

Consider tightening it to “CRD version”:

- return odherrors.NewStopError("failed to check %s CRDs version: %w", gvk.JobSetv1alpha2, err)
+ return odherrors.NewStopError("failed to check %s CRD version: %w", gvk.JobSetv1alpha2, err)

Also applies to: 38-45

internal/controller/components/trainer/trainer_controller_actions_test.go (1)

9-18: New JobSet CRD precondition tests are solid; optional setup deduplication

The two new tests exercise the intended scenarios well:

  • TestCheckPreConditions_Managed_JobSetCRDNotInstalled: validates that, with the JobSet operator present but no CRD, checkPreConditions fails and surfaces JobSetCRDMissingMessage.
  • TestCheckPreConditions_Managed_JobSetCRDInstalled: sets up a fake scheme, a JobSet CRD with the expected stored version, and an OperatorCondition, then confirms checkPreConditions succeeds.

This gives good coverage of the newly introduced CRD check while keeping the existing “operator not installed” case intact.

If you want to tidy things up further, you could extract a small helper to build the ReconciliationRequest (trainer instance + conditions manager) to reduce the repeated boilerplate across the three tests, but that’s purely optional.

Also applies to: 44-111

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a4994c and 1913800.

📒 Files selected for processing (4)
  • internal/controller/components/trainer/trainer_controller_actions.go (2 hunks)
  • internal/controller/components/trainer/trainer_controller_actions_test.go (2 hunks)
  • internal/controller/status/status.go (1 hunks)
  • pkg/cluster/gvk/gvk.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/controller/components/trainer/trainer_controller_actions.go (4)
pkg/cluster/resources.go (1)
  • HasCRD (366-368)
pkg/cluster/gvk/gvk.go (1)
  • JobSetv1alpha2 (641-645)
pkg/controller/actions/errors/errors.go (1)
  • NewStopError (20-24)
internal/controller/status/status.go (1)
  • JobSetCRDMissingMessage (183-183)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build/push catalog image
  • GitHub Check: Run tests and collect coverage on internal and pkg
  • GitHub Check: kube-linter
  • GitHub Check: golangci-lint
🔇 Additional comments (2)
internal/controller/status/status.go (1)

181-184: JobSet CRD missing status message is consistent and clear

The new JobSetCRDMissingMessage fits well with the existing status-message conventions and clearly distinguishes the “CRD missing” case from “operator not installed”. No issues from a correctness or UX standpoint.

pkg/cluster/gvk/gvk.go (1)

641-645: JobSetv1alpha2 GVK definition matches expected CRD identity

Group, Version, and Kind for JobSetv1alpha2 line up with the CRD name used in tests (jobsets.jobset.x-k8s.io) and what HasCRD expects. This should integrate cleanly with other GVK constants.

@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.08%. Comparing base (eb8fd29) to head (a50809b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...r/components/trainer/trainer_controller_actions.go 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2951      +/-   ##
==========================================
+ Coverage   47.06%   47.08%   +0.01%     
==========================================
  Files         148      148              
  Lines       10443    10448       +5     
==========================================
+ Hits         4915     4919       +4     
  Misses       4981     4981              
- Partials      547      548       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sutaakar sutaakar marked this pull request as ready for review December 5, 2025 11:34
@openshift-ci openshift-ci bot requested review from den-rgb and lphiri December 5, 2025 11:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26c1124 and a50809b.

📒 Files selected for processing (6)
  • internal/controller/components/trainer/trainer_controller_actions.go (2 hunks)
  • internal/controller/components/trainer/trainer_controller_actions_test.go (2 hunks)
  • internal/controller/status/status.go (1 hunks)
  • pkg/cluster/gvk/gvk.go (1 hunks)
  • tests/e2e/creation_test.go (2 hunks)
  • tests/e2e/helper_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/controller/status/status.go
  • internal/controller/components/trainer/trainer_controller_actions.go
  • internal/controller/components/trainer/trainer_controller_actions_test.go
  • pkg/cluster/gvk/gvk.go
🧰 Additional context used
🧬 Code graph analysis (1)
tests/e2e/creation_test.go (3)
tests/e2e/resource_options_test.go (3)
  • WithObjectToCreate (121-133)
  • WithCondition (262-266)
  • WithCustomErrorMsg (287-291)
tests/e2e/helper_test.go (1)
  • CreateJobSetOperator (406-420)
pkg/utils/test/matchers/jq/jq_matcher.go (1)
  • Match (11-15)
🔇 Additional comments (1)
tests/e2e/helper_test.go (1)

405-420: LGTM with verification needed.

The helper function correctly constructs an unstructured JobSetOperator resource following the same pattern as other helper functions in this file (e.g., CreateHardwareProfile). The structure is appropriate for e2e test resource creation.

However, please verify the apiVersion and resource configuration as noted in the review comment for creation_test.go.

@openshift-ci openshift-ci bot added the lgtm label Dec 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CFSNM

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Dec 5, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 95af01c into opendatahub-io:main Dec 5, 2025
22 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in ODH Platform Planning Dec 5, 2025
@sutaakar sutaakar deleted the trainer branch December 5, 2025 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants