-
Notifications
You must be signed in to change notification settings - Fork 214
Added new component mlflow #2955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces a new MlFlowOperator component to the OpenDataHub operator platform. It adds CRD type definitions, a controller reconciler with initialization and status update logic, integrates the component into DataScienceCluster infrastructure, registers RBAC rules, updates manifests and monitoring scaffolds, and includes end-to-end tests. Changes
Sequence DiagramsequenceDiagram
actor User
participant DSC as DataScienceCluster<br/>(v2)
participant Ctrl as MlFlowOperator<br/>Controller
participant K8s as Kubernetes API
participant Rec as Reconciler
User->>DSC: Create/Update DSC with<br/>MlFlowOperator enabled
DSC->>Ctrl: Trigger watch event
Ctrl->>Rec: NewComponentReconciler
Rec->>K8s: Initialize (append manifest)
Rec->>K8s: ApplyParams (apply image/config)
Rec->>K8s: Kustomize & Deploy<br/>(ConfigMap, Deployment, etc.)
K8s->>Rec: Deployment created/ready
Rec->>DSC: UpdateDSCStatus<br/>(set Ready condition)
DSC-->>User: Status updated<br/>with MlFlowOperator state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR can't be merged just yet 😢Please run For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/19890958495 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/api-overview.md (1)
1-37: Pipeline failure: Regenerate API documentation.The CI pipeline indicates that
docs/api-overview.mdhas uncommitted generated changes. This file appears to be auto-generated from API types and needs to be regenerated before merging.Run the documentation generation command (likely
make generate-docsor similar) and commit the updated file.#!/bin/bash # Find the make target for generating API docs rg -n "api-overview" Makefile # Check for doc generation scripts fd -t f "generate" --exec grep -l "api-overview" {} \;
🧹 Nitpick comments (3)
get_all_manifests.sh (1)
31-32: Placeholder manifest entries require completion.The commented MLflow operator manifest entries include TODO markers indicating that actual repository details need to be added before this component can fetch manifests.
Do you want me to help generate a tracking issue for completing the manifest repository configuration, or would you like guidance on the expected format based on other component entries?
internal/controller/components/mlflowoperator/monitoring/mlflowoperator-alerting.unit-tests.yaml (1)
1-25: Alerting unit-test scaffold is fine; remember to flesh it out before RHOAI release.The commented example for
MlflowOperatorDownprovides a useful template; just ensure concrete Prometheus alert unit tests are added here when enabling MLflow Operator alerting for RHOAI.config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml (1)
97-101: Consider standardizing the displayName to "MLflow Operator".The displayName "Ml Flow Operator" uses space separation, but MLflow is typically written as one word (capital M, capital L, lowercase flow). For consistency with the official MLflow branding:
- displayName: Ml Flow Operator + displayName: MLflow Operator
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
PROJECT(1 hunks)api/components/v1alpha1/mlflowoperator_types.go(1 hunks)api/components/v1alpha1/zz_generated.deepcopy.go(2 hunks)api/datasciencecluster/v1/datasciencecluster_conversion.go(3 hunks)api/datasciencecluster/v2/datasciencecluster_types.go(2 hunks)api/datasciencecluster/v2/zz_generated.deepcopy.go(2 hunks)cmd/component-codegen/README.md(1 hunks)cmd/component-codegen/cmd/generator/generator.go(1 hunks)cmd/main.go(1 hunks)config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml(1 hunks)docs/api-overview.md(34 hunks)get_all_manifests.sh(2 hunks)internal/controller/components/mlflowoperator/mlflowoperator.go(1 hunks)internal/controller/components/mlflowoperator/mlflowoperator_controller.go(1 hunks)internal/controller/components/mlflowoperator/mlflowoperator_controller_actions.go(1 hunks)internal/controller/components/mlflowoperator/mlflowoperator_support.go(1 hunks)internal/controller/components/mlflowoperator/monitoring/mlflowoperator-alerting.unit-tests.yaml(1 hunks)internal/controller/components/mlflowoperator/monitoring/mlflowoperator-prometheusrules.tmpl.yaml(1 hunks)internal/controller/datasciencecluster/kubebuilder_rbac.go(1 hunks)pkg/cluster/gvk/gvk.go(2 hunks)tests/e2e/controller_test.go(1 hunks)tests/e2e/helper_test.go(2 hunks)tests/e2e/mlflowoperator_test.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
api/datasciencecluster/v2/datasciencecluster_types.go (2)
api/components/v1alpha1/mlflowoperator_types.go (3)
MlFlowOperator(41-47)DSCMlFlowOperator(82-85)DSCMlFlowOperatorStatus(88-91)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)
internal/controller/components/mlflowoperator/mlflowoperator_controller.go (7)
api/components/v1alpha1/mlflowoperator_types.go (2)
MlFlowOperator(41-47)MlFlowOperatorInstanceName(11-11)pkg/cluster/gvk/gvk.go (4)
MlFlowOperator(329-333)Deployment(113-117)MLflow(647-651)CustomResourceDefinition(335-339)pkg/controller/predicates/resources/resources.go (1)
NewDeploymentPredicate(45-47)pkg/controller/handlers/handlers.go (1)
ToNamed(58-66)pkg/metadata/labels/types.go (1)
ODH(37-45)api/components/component.go (1)
Component(10-12)internal/controller/components/mlflowoperator/mlflowoperator_support.go (1)
ComponentName(11-11)
internal/controller/components/mlflowoperator/mlflowoperator_controller_actions.go (1)
pkg/controller/types/types.go (1)
ReconciliationRequest(74-110)
tests/e2e/mlflowoperator_test.go (5)
tests/e2e/components_test.go (2)
ComponentTestCtx(28-34)NewComponentTestCtx(43-61)api/components/v1alpha1/mlflowoperator_types.go (1)
MlFlowOperator(41-47)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)tests/e2e/controller_test.go (1)
TestCase(110-113)tests/e2e/helper_test.go (1)
RunTestCases(98-123)
api/components/v1alpha1/mlflowoperator_types.go (2)
api/common/types.go (4)
ComponentReleaseStatus(127-133)Status(97-105)Condition(36-94)ComponentRelease(117-123)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)
api/datasciencecluster/v2/zz_generated.deepcopy.go (2)
api/components/v1alpha1/mlflowoperator_types.go (1)
MlFlowOperator(41-47)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)
tests/e2e/controller_test.go (1)
api/components/v1alpha1/mlflowoperator_types.go (1)
MlFlowOperatorComponentName(9-9)
api/components/v1alpha1/zz_generated.deepcopy.go (2)
api/components/v1alpha1/mlflowoperator_types.go (8)
DSCMlFlowOperator(82-85)MlFlowOperatorCommonSpec(15-16)DSCMlFlowOperatorStatus(88-91)MlFlowOperatorCommonStatus(23-25)MlFlowOperator(41-47)MlFlowOperatorList(72-76)MlFlowOperatorSpec(18-20)MlFlowOperatorStatus(28-31)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)
tests/e2e/helper_test.go (3)
api/components/v1alpha1/mlflowoperator_types.go (2)
MlFlowOperator(41-47)DSCMlFlowOperator(82-85)pkg/cluster/gvk/gvk.go (1)
MlFlowOperator(329-333)api/common/types.go (1)
ManagementSpec(12-23)
internal/controller/components/mlflowoperator/mlflowoperator_support.go (4)
api/components/v1alpha1/mlflowoperator_types.go (2)
MlFlowOperatorComponentName(9-9)MlFlowOperatorKind(12-12)internal/controller/status/status.go (2)
ReadySuffix(112-112)ConditionDeploymentsAvailable(81-81)pkg/controller/types/types.go (1)
ManifestInfo(46-50)pkg/deploy/deploy.go (1)
DefaultManifestPath(49-49)
🪛 GitHub Actions: Check config and readme updates
docs/api-overview.md
[error] 1-1: Generated files have been missed in the PR. The diff indicates changes to docs/api-overview.md that are not committed.
🪛 markdownlint-cli2 (0.18.1)
docs/api-overview.md
1074-1074: Bare URL used
(MD034, no-bare-urls)
1075-1075: Bare URL used
(MD034, no-bare-urls)
1109-1109: Link fragments should be valid
(MD051, link-fragments)
1140-1140: Link fragments should be valid
(MD051, link-fragments)
1141-1141: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build/push catalog image
- GitHub Check: golangci-lint
- GitHub Check: kube-linter
- GitHub Check: Run tests and collect coverage on internal and pkg
🔇 Additional comments (31)
internal/controller/datasciencecluster/kubebuilder_rbac.go (1)
300-303: LGTM! RBAC markers correctly configured for MlFlowOperator.The kubebuilder RBAC annotations follow the standard pattern used by other components, covering the base resource, status subresource, and finalizers with appropriate verbs.
PROJECT (1)
147-154: LGTM! PROJECT resource entry correctly configured.The MlFlowOperator API resource entry follows the standard kubebuilder PROJECT file format and is consistent with other component entries.
api/datasciencecluster/v1/datasciencecluster_conversion.go (2)
126-130: LGTM! Conversion logic correctly handles backward compatibility.The MlFlowOperator initialization in ConvertTo with
ManagementState: Removedis the correct pattern for components that don't exist in v1, ensuring smooth v1→v2 DSC conversion.
161-165: LGTM! Status conversion follows the established pattern.The MlFlowOperatorStatus initialization mirrors the spec conversion, maintaining consistency between spec and status during v1→v2 conversion.
pkg/cluster/gvk/gvk.go (2)
329-333: LGTM! MlFlowOperator GVK correctly references component API constants.The MlFlowOperator GVK properly derives its group and version from componentApi, maintaining consistency with other component resources.
647-651: Verify the MLflow GVK is required for this PR.This introduces a separate
MLflowGVK with groupmlflow.opendatahub.io, distinct from theMlFlowOperatorcomponent. Clarify whether:
- The MLflow CRD comes from an external mlflow-operator that this component will manage
- This GVK is actively used in the controller implementation, or if it should be added when MLflow resource management is implemented
- There are corresponding CRD definitions or controller logic that references this GVK
Check the codebase for
gvk.MLflowreferences in controller code and verify this addition aligns with the PR's scope.api/components/v1alpha1/mlflowoperator_types.go (5)
8-13: LGTM! Constants follow naming conventions.The component name, instance name, and kind constants are correctly defined and follow the established patterns used by other components.
15-31: Empty spec and status are appropriate for initial scaffolding.The MlFlowOperatorSpec and MlFlowOperatorStatus types are currently empty (aside from embedded common types), which is expected for generated boilerplate code. Actual configuration fields will be added when implementing the component's functionality.
33-39: LGTM! Kubebuilder markers correctly configure the CRD.The markers properly configure:
- Cluster-scoped resource
- Status subresource
- Singleton validation (name must be "default-mlflowoperator")
- Print columns for Ready status and Reason
49-67: LGTM! Helper methods correctly implement status interfaces.The methods properly access embedded status fields and follow the pattern used by other components for condition and release status management.
82-91: LGTM! DSC embedding types follow established patterns.The DSCMlFlowOperator and DSCMlFlowOperatorStatus types correctly embed ManagementSpec for DSC integration, consistent with other components.
cmd/component-codegen/README.md (1)
28-28: LGTM! Documentation correctly updated to reference v2 types.The path update aligns with the v2 DataScienceCluster types being the hub version for conversions.
cmd/main.go (1)
94-94: LGTM! Side-effect import correctly registers MlFlowOperator component.The blank import ensures the mlflowoperator package's init function executes to register the component handler, following the standard pattern used by all other components.
tests/e2e/helper_test.go (2)
265-269: MlFlowOperator wiring in v2 DSC helper is consistent with other components.Initializing
MlFlowOperatorinCreateDSCwithManagementState: operatorv1.Removedmatches the pattern used for other components and aligns with the v2ComponentsAPI surface.
348-352: MlFlowOperator wiring in v1 DSC helper matches v1 Components layout.Adding
MlFlowOperatortoCreateDSCv1with aRemovedmanagement state is consistent with other components and should keep default e2e DSC instances stable.cmd/component-codegen/cmd/generator/generator.go (1)
13-17: Pointing DscTypesPath to v2 DSC types aligns generator with current API.Updating
DscTypesPathto the v2datasciencecluster_types.gokeeps the component codegen in sync with the v2 API (where MlFlowOperator and other newer components live). No issues spotted in how it’s used here.internal/controller/components/mlflowoperator/mlflowoperator_controller_actions.go (1)
25-36: Initialization correctly registers MlFlowOperator manifests for reconciliation.
initializesimply appendsmanifestPath()torr.Manifests, which is consistent with other component controllers’ setup helpers and keeps room for future precondition logic via the existing TODO.tests/e2e/controller_test.go (1)
119-137: Adding MlFlowOperator to the Components test group is wired correctly.Registering
MlFlowOperatorComponentNamein the first components scenario map cleanly integrates themlflowOperatorTestSuiteinto the e2e runner and CLI flag handling.api/datasciencecluster/v2/zz_generated.deepcopy.go (2)
28-44: DeepCopy for Components.MlFlowOperator matches existing generated patterns.The added
out.MlFlowOperator = in.MlFlowOperatorline is consistent with other value-type component fields and indicates controller-gen was rerun after extendingComponents.
56-72: DeepCopy for ComponentsStatus.MlFlowOperator is correctly wired.Calling
in.MlFlowOperator.DeepCopyInto(&out.MlFlowOperator)aligns with how other status substructures are handled in this generated file.api/datasciencecluster/v2/datasciencecluster_types.go (1)
67-72: MlFlowOperator fields are correctly added to DSC v2 spec and status.Adding
MlFlowOperatorand its status with type-safeDSCMlFlowOperator/DSCMlFlowOperatorStatusand JSON tagmlflowoperatorcleanly exposes the new component through the v2 DataScienceCluster API.Also applies to: 109-114
docs/api-overview.md (1)
1104-1109: Static analysis: Invalid link fragment#componentrelease.The Markdown linter reports that the link fragment
#componentreleaseon Line 1109 may be invalid. Since this is auto-generated documentation, verify that theComponentReleasetype anchor exists in the generated output, or update the generation template if needed.internal/controller/components/mlflowoperator/mlflowoperator_controller.go (1)
43-83: Controller reconciler setup follows established patterns.The reconciler configuration includes appropriate ownership declarations for managed resources, dynamic GVK watching for MLflow CRD, and a standard action pipeline (initialize → releases → kustomize → deploy → deployments → gc) with condition type registration.
Verification of
componentHandlertype andinitializeaction definitions in companion files is required to confirm completeness.tests/e2e/mlflowoperator_test.go (1)
1-37: LGTM! Test suite follows established patterns.The test structure correctly mirrors other component test suites in the codebase:
- Embeds
ComponentTestCtxfor shared test context- Uses
t.Helper()appropriately- Covers essential component lifecycle test cases
However, verify that this test suite is registered in the test controller (e.g., called from a
TestMlFlowOperatorentry point or similar test invocation) to ensure it runs as part of the test suite.internal/controller/components/mlflowoperator/mlflowoperator_support.go (2)
16-29: EmptyimageParamMaprequires verification against actual manifests.The
imageParamMapis currently empty with TODO comments. Before considering this complete, verify that either:
- The MLflow operator manifests don't require image placeholder substitution, or
- This map needs to be populated with placeholder-to-environment-variable mappings
Check the manifests in your params.env or kustomization files to determine if image substitutions are necessary.
36-42: Verify manifest directory path "rhoai" is correctly configured in opendatahub-manifests or RHOAI overlays.OpenDataHub uses kustomize bases/overlays for component manifests, with RHOAI-specific overlays defined in separate repositories. The
SourcePath: "rhoai"should reference a valid directory in either odh-manifests or rhods-operator. Confirm the path resolves correctly and contains the expected kustomize structure for the MLflow/model-registry component.internal/controller/components/mlflowoperator/mlflowoperator.go (4)
35-51: LGTM!The
NewCRObjectimplementation correctly constructs the MlFlowOperator CR with appropriate TypeMeta, ObjectMeta, annotations, and spec from the DataScienceCluster, following the established pattern.
61-63: LGTM!The
IsEnabledcheck correctly evaluates the ManagementState againstoperatorv1.Managed.
80-103: LGTM!The status update logic correctly:
- Normalizes management state
- Resets and populates the MlFlowOperator status
- Marks conditions appropriately based on enablement and existing CR status
53-59: Theplatformparameter is unused in this implementation.This appears to be intentional for interface compliance, as Go requires all implementations of an interface to have matching method signatures. If other component handlers in the codebase follow the same pattern (with
platformalso unused), this is expected and acceptable design.api/components/v1alpha1/zz_generated.deepcopy.go (1)
324-360: LGTM!The autogenerated deepcopy implementations for MlFlowOperator types are consistent with the type definitions and follow the same patterns as other components. The pointer handling in
DSCMlFlowOperatorStatus.DeepCopyInto(line 345-349) correctly creates a new instance and copies into it.Also applies to: 1402-1523
...nal/controller/components/mlflowoperator/monitoring/mlflowoperator-prometheusrules.tmpl.yaml
Outdated
Show resolved
Hide resolved
| var ( | ||
| imageParamMap = map[string]string{ | ||
| // TODO: Add your MLflow operator image parameter mappings here | ||
| // Format: "placeholder-in-manifest": "ENVIRONMENT_VARIABLE_NAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlkyrillos @lburgazzoli what is the right value to put inside imageParamMap?
| Owns(&corev1.ConfigMap{}). | ||
| Owns(&promv1.PodMonitor{}). | ||
| Owns(&rbacv1.ClusterRoleBinding{}). | ||
| Owns(&rbacv1.ClusterRole{}). | ||
| Owns(&corev1.ServiceAccount{}). | ||
| Owns(&corev1.Service{}). | ||
| Owns(&appsv1.Deployment{}, reconciler.WithPredicates(resources.NewDeploymentPredicate())). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need double check once Humair has code ready
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HumairAK can you verify of owned resources for the mlflow operator?
| // Common MLflow images you might need: | ||
| // "mlflow-server-image": "RELATED_IMAGE_ODH_MLFLOW_SERVER_IMAGE", | ||
| // "mlflow-tracking-server": "RELATED_IMAGE_ODH_MLFLOW_TRACKING_SERVER_IMAGE", | ||
| // "mlflow-operator-image": "RELATED_IMAGE_ODH_MLFLOW_OPERATOR_IMAGE", | ||
| // "postgresql-image": "RELATED_IMAGE_ODH_MLFLOW_POSTGRES_IMAGE", | ||
| // "mlflow-ui-image": "RELATED_IMAGE_ODH_MLFLOW_UI_IMAGE", | ||
| // | ||
| // These placeholders should match what's in your params.env file in the manifests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check with DevOps once they start work on building images
| func manifestPath() types.ManifestInfo { | ||
| return types.ManifestInfo{ | ||
| Path: odhdeploy.DefaultManifestPath, | ||
| ContextDir: ComponentName, | ||
| SourcePath: "rhoai", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check with Humair once his part is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HumairAK Also this.
...rnal/controller/components/mlflowoperator/monitoring/mlflowoperator-alerting.unit-tests.yaml
Outdated
Show resolved
Hide resolved
...nal/controller/components/mlflowoperator/monitoring/mlflowoperator-prometheusrules.tmpl.yaml
Outdated
Show resolved
Hide resolved
bdcd753 to
ee5ba41
Compare
This PR can't be merged just yet 😢Please run For more info: https://github.com/opendatahub-io/opendatahub-operator/actions/runs/19920975810 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2955 +/- ##
=======================================
Coverage 47.06% 47.06%
=======================================
Files 148 148
Lines 10443 10443
=======================================
Hits 4915 4915
Misses 4981 4981
Partials 547 547 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
How Has This Been Tested?
Screenshot or short clip
Merge criteria
E2E test suite update requirement
When bringing new changes to the operator code, such changes are by default required to be accompanied by extending and/or updating the E2E test suite accordingly.
To opt-out of this requirement:
E2E update requirement opt-out justificationsection belowE2E update requirement opt-out justification
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.