Try out if `kro` could be feasible as deployment-tool #164

frewilhelm · 2025-03-26T12:07:41Z

frewilhelm · 2025-03-27T16:03:52Z

Estimations

Estimation of impact on existing code incl. tests

Configuration and localisation could be replaced by kro ResourceGraphDefintion CEL statements
- Configuration + localisation while deploying (no intermediate results are stored anymore)
  - Values for configuration and localisation (e.g. image locations, ...) can be set hardcoded, passed through kros instance, or dynamically referencing other k8s resources (as long as they are known in the cluster and accessed inside the graph).
- to localise we deploy a CRD ocm resource and store its access information about the source OCI reference in its status. Accordingly, we do not need an intermediate layer to provide the resources as the original source is used.
  - works only for OCI artifacts
  - kro does not allow to reference json apiextension.RAW (problem: dynamic fields that are not know previously). This raises an error in the ResourceGraphDefinition as its dry-run (checking if fields exists) is accordingly failing.
    - API changes are required
- Omitting all CRDs for configuration and localisation, e.g. localisation-rules.
OCI storage backend implementation + zot-registry could be removed, assuming we don't need to store any resources from localisation or configuration (compatibility layer)
- OCM component descriptor (lists) cannot be stored anymore. Adjustments are needed
- Omitting the storage backend means that localBlob resources cannot be deployed as every resources needs a source OCI registry from which it can be fetched.
e2e-tests must be adjusted
- Deployment of kro
- Replace current resources with ResourceGraphDefinitions
"unit" controller tests must be adjusted to work without the storage
Config inheritance could be omitted as it can be specified in the ResourceGraphDefinition directly. Would reduce complexity towards propagation-policy of each config.
Probably requires the possibility to pack a ResourceGraphDefinition in an ocm component version and use it for deployment
- Requires a new operator

Estimation of impact on Enduser Documentation updated (if applicable)

Enduser Documentation is already outdated and requires some work either way
Would/Can be part of EPIC: Documentation of v2 controllers (Repository documentation) ocm-project#370

Estimation of impact on Internal technical Documentation created/updated (if applicable)

Technical documentation is already outdated and requires some work either way

Created refinable tasks for the actual implementation

As this spike is part of an ADR (Create a Deployment ADR #136) the refinable tasks will be part of the decision of this ADR. A quick overview is presented in the "Estimation of impact on exists code incl. tests"

frewilhelm · 2025-03-27T16:23:33Z

Current progress is saved in https://github.com/frewilhelm/ocm-k8s-toolkit/tree/spike_kro (based on #98)

frewilhelm · 2025-03-28T14:07:31Z

A potential blocker could be that instances are not reconciled when their graph is updated. Thus, changes will not be propagated to the resources.

However, there are at least two issues, one of them a feature request from the maintainers themselves, that address this issue and want to fix it.

ikhandamirov · 2025-03-28T20:02:15Z

Wow, sounds like a huge simplification!

frewilhelm · 2025-03-31T08:48:32Z

In a scenario in which the ResourceGraphDefinition is part of the CV there is a possibility of a potential race condition:

CV contains
- OCM Resource (e.g. HelmChart to-be-deployed)
- ResourceGraphDefinition that contains
  - k8s OCM resource (refers to OCM Resource in CV)
  - FluxCD OCI Repository (refers to location of CV stored in OCI registry)
  - FluxCD HelmRelease (points to FluxCD OCI Repository)

To deploy the resource, the user has to deploy the k8s resources OCMRepository (points to OCI registry in which the CV is stored), component (points to the component name and OCMRepository), resource (points to ResourceGraphDefinition in the CV and component), and a new CRD OCMDeployer (or the like) that references the resource for the ResourceGraphDefinition. The OCMDeployer takes the manifest of the ResourceGraphDefinition and deploys it. After creating an instance for the new kind from the ResourceGraphDefinition, kro will deploy the OCM resource (the HelmChart) using the resources for OCM and Flux from the ResourceGraphDefinition.

The problem arises when the CV is updated with a new version. This triggers an update of the k8s resource component. However, the resource component is watched by (a) the RGD resource that is also watched by the OCMDeployer and (b) the k8s resource resource that contains the HelmChart. As an example, assume that the k8s resource resource for the HelmChart is removed from the updated CV. The trigger/watch for the RGD resource is fine as this will just deploy the new graph. But what happens with the original k8s resource resource for the HelmChart. If it reconciles before the RGD is updated, then it will fail as the resource cannot be find anymore in the component.

This is not necessarily a problem in the context of eventually consistency as we expect that the k8s resource resource for the helmChart can fail at first, but will then be removed, when the RGD is updated (we assume the RGD is updated as well as one resource was deleted). But we should keep such scenarios in mind.

Skarlso · 2025-03-31T10:20:29Z

Also, let's consider the considerable setup complication ( needing to install and configure kro ) plus the overhead of people learning kro and maintenance of kro version ( unless we farm this out to the infra maintainers which is as additional burden on them in that case ).

OCI storage backend implementation + zot-registry could be removed, assuming we don't need to store any resources from localisation or configuration (compatibility layer)

I don't understand this one. :) The registry is a cache and a sync point. It's not just there to share results, but it's also there so that we don't have to re-download a 6 gigabyte image when it's being fetched from somewhere over and over dealing with the same component or resource. Also, it's not explained how you would work with Flux then? Like, how do you present it with the created artifact that it needs to deploy?

frewilhelm · 2025-03-31T10:35:29Z

The registry is a cache and a sync point. It's not just there to share results, but it's also there so that we don't have to re-download a 6 gigabyte image when it's being fetched from somewhere over and over dealing with the same component or resource

If we omit the configuration and localisation (or rather move it to kros ResourceGraphDefinition), then we do not have to download any image at all.

If the image must be available in a specific environment, one could use the replication controller to move the component version to that environment.

Also, it's not explained how you would work with Flux then? Like, how do you present it with the created artifact that it needs to deploy?

Assuming we omit the OCI registry, then we would need to publish the original source (= OCI registry, HelmRepository, git Repository, or the like) in the status of the resource. This information can then be taken as CEL to pass the location for example to a FluxCD OCI Repository. Consider the following example of an ResourceGraphDefinition:

apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: complicated-deployment
spec:
  schema:
    apiVersion: v1alpha1
    kind: ComplicatedDeployment
    spec:
      podinfo:
        releaseName: string
        message: string | default="hello, world"
  resources:
    - id: resourceChart
      template:
        apiVersion: delivery.ocm.software/v1alpha1
        kind: Resource
        metadata:
          name: static-resource-chart-name
        spec:
          componentRef:
            name: static-component-name # should be referenced/passed
          resource:
            byReference:
              resource:
                name: helm-resource
          interval: 10m
    - id: resourceImage
      template:
        apiVersion: delivery.ocm.software/v1alpha1
        kind: Resource
        metadata:
          name: static-resource-image-name
        spec:
          componentRef:
            name: static-component-name # should be referenced/passed
          resource:
            byReference:
              resource:
                name: image
          interval: 10m
    - id: ocirepository
      template:
        apiVersion: source.toolkit.fluxcd.io/v1beta2
        kind: OCIRepository
        metadata:
          name: helm-podinfo-config
        spec:
          interval: 1m0s
          layerSelector:
            mediaType: "application/vnd.cncf.helm.chart.content.v1.tar+gzip"
            operation: copy
          url: ${resourceChart.status.ociArtifact.sourceReference.registry}/${resourceChart.status.ociArtifact.sourceReference.repository} 
          ref:
            digest: ${resourceChart.status.ociArtifact.digest}
    - id: helmrelease
      template:
        apiVersion: helm.toolkit.fluxcd.io/v2
        kind: HelmRelease
        metadata:
          name: ${schema.spec.podinfo.releaseName}
        spec:
          releaseName: ${schema.spec.podinfo.releaseName}
          interval: 1m
          timeout: 5m
          chartRef:
            kind: OCIRepository
            name: ${ocirepository.metadata.name}
            namespace: default
          values:
            # Localisation
            image:
              repository: ${resourceImage.status.ociArtifact.sourceReference.registry}/${resourceImage.status.ociArtifact.sourceReference.repository}
              tag: ${resourceImage.status.ociArtifact.sourceReference.reference}
            # Configuration
            ui:
              message: ${schema.spec.podinfo.message}

Skarlso · 2025-03-31T10:42:13Z

If the image must be available in a specific environment, one could use the replication controller to move the component version to that environment.

The replication controller was all but archived.

This information can then be taken as CEL to pass the location for example to a FluxCD OCI Repository. Consider the following example of an ResourceGraphDefinition:

How would that work with modified resources by the ocm client during transfer? Which step would that be? So you declare a component version, you declare a target, and then deploy that all with Kro, and by the end there would be an end registry with a Status updated having the location of the endresult in a registry I assume?

Keep in mind that this all needs to work offline.

frewilhelm · 2025-03-31T11:19:41Z

Note: A replication cannot be part of a RGD in the same CV that should be replicated with that replication.

Skarlso · 2025-03-31T11:40:16Z

We talked about this on slack. Outcome:

since loc/conf won't be part of the main flow anymore, the reason for the registry becomes not that great anymore
shared some historical reasons behind the local registry including DMZs where the registry was the target ( this can, of course be mitigated if the infrastructure maintainers run their own registry )
shared two concerns:
- increase in burden for the infra maintainers ( kro + local registry in case of DMZ )
- kro is really new in the game being about a year old might raise concerns with some clients who have strict environment policies

frewilhelm · 2025-04-01T14:29:24Z

Spike is closed and the implementation will be implemented here #172

frewilhelm assigned frewilhelm, fabianburth and ikhandamirov Mar 26, 2025

frewilhelm added this to OCM Backlog Board Mar 26, 2025

github-project-automation bot moved this to 🆕 ToDo in OCM Backlog Board Mar 26, 2025

frewilhelm moved this from 🆕 ToDo to 🏗 In Progress in OCM Backlog Board Mar 26, 2025

frewilhelm added needs/validation Validate the issue and assign a priority needs/refinement Discuss with the team and gain a shared understanding labels Mar 26, 2025

ikhandamirov removed needs/validation Validate the issue and assign a priority needs/refinement Discuss with the team and gain a shared understanding labels Mar 27, 2025

frewilhelm unassigned ikhandamirov Mar 27, 2025

This was referenced Apr 1, 2025

Provide and implement authentication for source registry (and internal registry if necessary) #171

Closed

Implement a Deployer using kro #172

Open

frewilhelm closed this as completed Apr 1, 2025

github-project-automation bot moved this from 🏗 In Progress to 🍺 Done in OCM Backlog Board Apr 1, 2025

ocmbot bot added this to the 2025-Q2 milestone Apr 1, 2025

frewilhelm mentioned this issue Apr 7, 2025

EPIC: Deployer for ocm-k8s-toolkit #147

Open

4 tasks

fabianburth mentioned this issue Apr 9, 2025

Investigate KRO deletion and drift detection open-component-model/ocm-project#456

Closed

1 task

ocmbot bot moved this from 🍺 Done to 🔒Closed in OCM Backlog Board Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try out if `kro` could be feasible as deployment-tool #164

Try out if `kro` could be feasible as deployment-tool #164

frewilhelm commented Mar 26, 2025 •

edited

Loading

frewilhelm commented Mar 27, 2025

frewilhelm commented Mar 27, 2025

frewilhelm commented Mar 28, 2025

ikhandamirov commented Mar 28, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Apr 1, 2025

Try out if kro could be feasible as deployment-tool #164

Try out if kro could be feasible as deployment-tool #164

Comments

frewilhelm commented Mar 26, 2025 • edited Loading

Timebox: 3 day(s)

frewilhelm commented Mar 27, 2025

Estimations

Estimation of impact on existing code incl. tests

frewilhelm commented Mar 27, 2025

frewilhelm commented Mar 28, 2025

ikhandamirov commented Mar 28, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Mar 31, 2025

Skarlso commented Mar 31, 2025

frewilhelm commented Apr 1, 2025

Try out if `kro` could be feasible as deployment-tool #164

Try out if `kro` could be feasible as deployment-tool #164

frewilhelm commented Mar 26, 2025 •

edited

Loading