Skip to content

Conversation

@whatsacomputertho
Copy link
Member

Skopeo prune

TL;DR - This PR introduces a new command skopeo prune which prunes repo tags prior to a semver or date/time threshold. This also enhances the existing skopeo list-tags command to filter tags by semver or date/time so that you can double check which repo tags will be pruned before pruning.

This might still be a little hacky in its current form (and missing tests / doc, etc.), so I look forward to hearing your feedback. This is somewhat on purpose because I wanted to just get something up and running so that I could get some feedback & iterate on things.

Background

Recently our team was required to migrate between development registries. Our previous registry began to enforce new usage costs requiring us to go through our dev images and prune them, leaving only the latest versions. Our new registry also imposes similar costs, so we will need to continue to stay on top of pruning our old image versions in our CI.

We built custom tooling atop the containers/image library to help drive this large image pruning effort, and I wanted to propose this as a new feature in skopeo as it proved to be very useful for us.

Demo

Tag list filtering

Using skopeo list-tags (default behavior below - all tags are listed)

skopeo-list-tags-default

You can now filter out your list of tags by

Semver

The default behavior when the --before-version <semver> option is given is to try to read the version from the image tags directly, but assumes the tags can be converted to semver. Tags that cannot be converted to semver are sorted into an invalid category which can be shown by passing in the --invalid flag (this is not demoed below).

skopeo-list-tags-before-version-tag

Alternatively you may also supply a --version-label <label> argument if you label your images with their semver. This adds overhead as it requires we inspect each tag individually to perform the sorting into before / after / invalid (this is also not demoed, but same deal as above, but imagine a little git-like progress counter also appearing like what can be seen in the following example).

Date/time

The default behavior when the --before-time <rfc3339-timestamp> option is given is the Created date is used to determine whether the image/tag was created before the given time threshold and sorts into before / after.

skopeo-list-tags-before-time

Pruning

Introduces a new skopeo prune command, which can prune image tags prior to a given semver or date/time threshold using the same behavior as demoed above undr the hood. It will compute a summary of storage space freed in pruning, but this can be skipped via the -s / --skip-summary flag. It will prompt the user to confirm they would like to proceed to prune, but this can be skipped via the -y / --non-interactive flag.

skopeo-prune-before-version

@packit-as-a-service
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@whatsacomputertho
Copy link
Member Author

I see there are some failures in CI & conflicts to resolve. Getting late here - I will circle back to those tomorrow. I appreciate any sort of general / specific feedback folks may want to give in the meantime!

Copy link
Contributor

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I didn’t read all of this, but just at a conceptual level, I think this doesn’t work as expected (fixable) and, given the current registry APIs, this whole feature set shouldn’t be built at this time, even if it were fixed to delete images correctly.

I think the best path forward is either to build things like this as a part of some higher-level build system / product maintenance system that manages the products and published versions as a whole — not just delete the final images; or to work in https://github.com/opencontainers/distribution-spec to allow doing that filtering in a reasonable number of HTTP requests instead of by inspecting one image at a time.

}
}()
unparsedInstance := image.UnparsedInstance(src, nil)
img, err := image.FromUnparsedImage(ctx, sys, unparsedInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this (listing potentially thousands of tags, and launching thousands of metadata fetch tasks — in regularly running prune jobs) just doesn’t scale, and providing a tool that hammers a registry with requests like this is not being a good member of an ecosystem.

Sure, anyone can implement a loop like this — as you just did — but I think it’s not something that should be widely deployed. Users need to be steered to either using product-specific registry APIs, or to maintain the information in some other location, probably as part of a build / product maintenance system. Or, ideally, the cross-vendor API could be extended to do this natively / server-side.

}

// Delete the image corresponding to the reference
err = ref.DeleteImage(ctx, sys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given what DeleteImage currently does ( #1432 ) I think this is fundamentally unsuitable for the stated purpose.

That could certainly be fixed, but this must not be merged as is.

}
}()
unparsedInstance := image.UnparsedInstance(src, nil)
img, err := image.FromUnparsedImage(ctx, sys, unparsedInstance)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BTW this inspects one platform image out of possibly many, with the data possibly inconsistent between the per-platform variants. That’s a ~pedantic conceptual problem, I don’t know that it can be fixed, in practice assuming that all the per-platform variants have consistent data is almost certainly good enough.)

@whatsacomputertho
Copy link
Member Author

whatsacomputertho commented Jul 17, 2025

or to work in https://github.com/opencontainers/distribution-spec to allow doing that filtering in a reasonable number of HTTP requests instead of by inspecting one image at a time.

@mtrmac I think I'd be interested in switching gears and trying to get some filtering capabilities written into the distribution spec if possible. Do you think this type of thing would need to be brought up with their mailing list ("longer discussions") or do you think this would be a feasible enough request to just raise as an issue ("Issues are used for bugs and actionable items")

@mtrmac
Copy link
Contributor

mtrmac commented Jul 18, 2025

I don’t know, I’m afraid I haven’t meaningfully participated in that community.

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants