Skip to content

Conversation

@seisman
Copy link
Member

@seisman seisman commented Nov 2, 2025

For new contributors, it's frustrating to see that so mamy workflows fail due to the DVC issue reported in #4147.

This PR adds a workflow for this issue. It adds a new workflow cache_dvc.yaml which pulls the baseline images from DagsHub and uploads the .dvc/cache file as GitHub artifacts.

For PRs from forks, the dvc pull command fails, so we can download the dvc cache from the GitHub artifact.

Below are workflows that use dvc pull and may fail:

  • ci_tests_dev.yaml
  • ci_tests_dev.yaml
  • dvc-diff.yaml: The cache won't work in this workflow, since the dvc cache reflect baseline images in the main branch, but this workflow needs baseline images in both the main and the current branch. But it's unlikely that PRs from new contributors will trigger this workflow, since they can't do 'dvc push' either. So no need to apply the workaround
  • release-baseline-images.yml: Doesn't run in PRs, so no need to apply the workaround

Tests

@seisman seisman added this to the 0.18.0 milestone Nov 2, 2025
@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Nov 2, 2025
DAGSHUB_TOKEN: ${{ secrets.DAGSHUB_TOKEN }}

- name: Download DVC cache as artifacts from GitHub
if: steps.dvc-pull.outcome == 'failure'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Just a couple of suggestions.

permissions: {}

jobs:
dvc_cache:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could probably this as a separate job under .github/workflows/cache_data.yaml if want to reduce the number of workflow files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two jobs have different trigger conditions. The cache_data workflow is usually scheduled to run weekly but the cache_dvc workflow needs to run when .dvc files change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, ok to keep them separate then.

@seisman seisman removed the needs review This PR has higher priority and needs review. label Nov 3, 2025
@seisman seisman merged commit 444d3da into main Nov 3, 2025
17 of 19 checks passed
@seisman seisman deleted the ci/dvc branch November 3, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Boring but important stuff for the core devs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants