diff --git a/docs/oci-registry.md b/docs/oci-registry.md new file mode 100644 index 000000000..4d2c4417b --- /dev/null +++ b/docs/oci-registry.md @@ -0,0 +1,65 @@ +# OCI Registry as a Kubeflow Model Registry + +## Authors + +- Ramkumar Chinchani (Cisco) +- _TBD_ + +## Maintainers + +- Ramkumar Chinchani (Cisco) +- _TBD_ + +## Motivation + +According to the [Kubeflow 2023 +survey](https://blog.kubeflow.org/kubeflow-user-survey-2023/), 44% of users +identified Model Registry as one of the big gaps in the user’s ML Lifecycle +missing from the Kubeflow offering. + +![Kubeflow survey](diagrams/model-registry-kubeflowsurvey.png "Kubeflow survey") + +## Solution Overview + +[Open Container Initiative](https://opencontainers.org/) is a sibling (to CNCF) +organization under [The Linux Foundation](https://www.linuxfoundation.org/) +which has the container +[runtime](https://github.com/opencontainers/runtime-spec), +[image](https://github.com/opencontainers/image-spec) and +[distribution](https://github.com/opencontainers/distribution-spec) +specifications under its purvey which are vendor-neutral contracts that the Kubernetes +ecosystem relies on for running, filesystem layout, and pushing and pulling of +container images. + +However, recent developments in the OCI, specifically +[_image_](https://github.com/opencontainers/image-spec/releases/tag/v1.1.0) and +[_distribution_](https://github.com/opencontainers/distribution-spec/releases/tag/v1.1.0) +spec **v1.1.0**, have included support for pushing arbitrary artifacts along +with support for relationships between artifacts. + +## OCI v1.1.0 Conformant Registries + +The following are the highlights about OCI artifact registries. + +- Container images: these represent workloads and have been the traditional use case for an OCI conformant registry. + +- Artifacts: these represent arbitrary data (ML model data or additional + metadata in this context) that can also be pushed and pulled from an OCI + conformant registry. + +- Content-addressable: all data is organized as a Merkle DAG with sha256 hashed + blobs. This bodes well for reproducibility. + +- Versioning: apart from the sha256 hash, all data can be tagged with a human-readable version. + +- Annotations: there is provision to append arbitrary annotations to any artifact. + +- References: an artifact can now be pushed along with a reference to another + artifact (via the `Subject` field) which can be leveraged to address the data + lineage use case. + + +## References + +_TBD_ +