Skip to content

CEP XXXX: Build provenance metadata#113

Open
jaimergp wants to merge 16 commits intoconda:mainfrom
jaimergp:provenance
Open

CEP XXXX: Build provenance metadata#113
jaimergp wants to merge 16 commits intoconda:mainfrom
jaimergp:provenance

Conversation

@jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Mar 10, 2025

Checklist for submitter

  • I am submitting a new CEP: Build provenance metadata.
    • I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
  • I am submitting modifications to CEP XX.
  • Something else: (add your description here).

Checklist for CEP approvals

  • The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
  • A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
  • The cep-XXXX.md file has been renamed accordingly.
  • The # CEP XXXX - header has been edited accordingly.
  • The CEP status in the table has been changed to approved.
  • The last modification date in the table has been updated accordingly.
  • The pre-commit checks are passing.

Copy link

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see this being formalized!

cep-9999.md Outdated
- `remote_url`: Required on CI. Repository URL of the feedstock being built.
- `flow_run_id`: Optional. CI-specific identifier for the workflow run.

For local workflows such as those specified by CFEP 03, `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For local workflows such as those specified by CFEP 03, `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.
For local workflows such as those specified by [CFEP-03](https://github.com/conda-forge/cfep/blob/main/cfep-03.md), `remote_url` MAY be omitted, but authors strongly recommend providing the adequate value manually if necessary.

Also, if remote_url is omitted, should sha also be omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I had assumed users interested in provenance are already using some type of version control, but maybe we can't force that either.

About the dash in CFEP 03, see

name: CEPs must be referred to with 'CEP N' (no dash)
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep sha as mandatory ... some packages may not have the remote_url to not expose private repositories to the public, but the sha is this helpful for internal audits / attestations to check that e.g. the sha actually exists / was reviewed as part of an PR / triggered a CI run etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am writing this to describe the current conventions, not to impose on how packages should be built. I think that should be decided by packaging organizations separately. I plan to submit a CFEP for conda-forge where we do "require these fields in CI pipelines, as recommended by CEP XYZ".

Someone using conda-build to share an artifact with their research lab internally may not need to care about whether the recipe is version controlled or what a git hash is.

cep-9999.md Outdated

- `sha`: Required. Commit hash of the feedstock being built.
- `remote_url`: Required on CI. Repository URL of the feedstock being built.
- `flow_run_id`: Optional. CI-specific identifier for the workflow run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name flow_run_id originates from Anaconda's current build system. I wish that we could use something more generic and meaningful. But I guess the boat has sailed already? Or do you foresee a future where CF would be willing to change this to something else (on our side at Anaconda, this is very easy to do).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change it easily too, but then we will have to maintain two ways of accessing this metadata because the already stamped artifacts won't be rebuilt. I think flow_run_id is sufficiently generic. I always read it as "(work)flow run ID".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow_run_id is constantly used for defaults and conda-forge and the naming was accepted on both sites in the past ... have a look into this PR ... you can see it that PR that every CI system (Azure, Travis, Github,...) name their variable containing the ID differently ... flow_run_id was/is meant to be agnostic and the value prefix tells what automation/ci/flow/workflow system was used.

@jaimergp jaimergp mentioned this pull request Mar 11, 2025
@chenghlee
Copy link
Contributor

Is this an intermediate step towards generating SLSA provenance attestations? If so, should we just straight in that direction, or would trying to implement SLSA delay the gains we want to get now?

@dbast
Copy link
Member

dbast commented Mar 11, 2025

thanks @jaimergp for the initiative here.

some background:

@chenghlee yes, it was meant as intermediate step to enable (multiple different) actual attestations via an attestation worker using the data to lookup things:

  • process attestation: does the mentioned sha actually exist and was it part of a PR review process?
  • automation attestation: the sha after merge to main triggered a CI build matching the flow_run_id and the sha256 of the package is also present in the build log -> proof that a package is automatically build without manual interventions.
  • provenance attestation: if the previous attestations are successfully it becomes implicitly clear that the provenance data is correct.

though, if and how those attestations can be stored via e.g. Sigstore in context of SLSA is a different thing.

tl;dr enables attestations that would be hard otherwise without any provenance data.

@jaimergp
Copy link
Contributor Author

Is this an intermediate step towards generating SLSA provenance attestations? If so, should we just straight in that direction, or would trying to implement SLSA delay the gains we want to get now?

I'm not personally aiming for that, only wanted to standardise what otherwise was an undocumented convention. I think that SLSA provenance can be iterated on later, and this can just reflect the current state. This way we can refer at non-SLSA provenance like "CEP XYZ metadata".

@jaimergp jaimergp changed the title Add CEP for build provenance metadata CEP XXXX: Build provenance metadata Sep 27, 2025
@jaimergp jaimergp moved this to In Progress in STA conda & conda-forge Sep 27, 2025
@jaimergp jaimergp self-assigned this Sep 30, 2025
@jaimergp jaimergp marked this pull request as ready for review January 28, 2026 16:49
@jaimergp jaimergp requested a review from a team January 28, 2026 16:55
cep-XXXX.md Outdated

- `sha`: String. Full commit hash of the recipe repository being built.
- `remote_url`: String. CVS URL of the recipe repository being built. HTTP(S) preferred.
- `flow_run_id`: String. CI-specific identifier for the workflow run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example on how this value can be used? I dont think conda-metadata-app uses this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only link to the commit hash, which on the GH UI should provide enough information for a user to navigate to the CI workflow (and then manually check the workflow run ID or something). I guess I was just lazy to postprocess strings like github_1234565435 or azure_79979955940; we'd require more information to build the full URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since the workflow logs tend to expire, they are not as useful long term. But we could also add other known keys like flow_run_url for build farms to populate.

@baszalmstra
Copy link
Contributor

After properly reading this CEP, we immediately updated this on https://prefix.dev!

In the next few days when you click a variant on https://prefix.dev you will see something like:

image

This is using the provenance metadata from this CEP. If however, the package has signed CEP 27 compliant attestation you will see:

image

@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 16, 2026

Dear @conda/steering-council,

The vote for this CEP has started. It will be open for two weeks, until March 2nd, 2026, 23:59 Anywhere on Earth. This time period has been chosen to make it eligible for time-out rules. As an Enhancement Proposal vote, it requires 60% affirmative votes to pass.

To vote, please mark the relevant checkbox under your username:

@jaimergp
Copy link
Contributor Author

@CJ-Wright @mariusvniekerk @chenghlee @marcelotrevisani @msarahan @mbargull @jakirkham, gentle reminder to cast your vote on this CEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provenance metadata

6 participants