Skip to content

CEP 29: MatchSpec minilanguage#82

Merged
jaimergp merged 56 commits intoconda:mainfrom
jaimergp:matchspec
Mar 4, 2026
Merged

CEP 29: MatchSpec minilanguage#82
jaimergp merged 56 commits intoconda:mainfrom
jaimergp:matchspec

Conversation

@jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Jun 4, 2024

Checklist for submitter

  • I am submitting a new CEP: MatchSpec minilanguage.
    • I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
  • I am submitting modifications to CEP XX.
  • Something else: (add your description here).

Checklist for CEP approvals

  • The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
  • A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
  • The cep-XXXX.md file has been renamed accordingly.
  • The # CEP XXXX - header has been edited accordingly.
  • The CEP status in the table has been changed to approved.
  • The last modification date in the table has been updated accordingly.
  • The pre-commit checks are passing.

Closes #80

@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 4, 2024

I'm seeing myself referring to the "MatchSpec" interface in other CEPs yet this is not standardized, so there we go. Let's open that can of worms.

@jaimergp jaimergp mentioned this pull request Jun 4, 2024
2 tasks
@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 5, 2024

This will probably need another CEP on PackageRecord, which will probably ask for Repodata counterparts and... channel structure. Yay. I like how packaging.python.org does this btw. I'll probably copy some of that structure.

cep-??.md Outdated

### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When matching by checksum, should you also add the subdir? If I'm not mistaken, it's possible for two subdirs to contain a package with the same checksum right? Or is this a corner case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checksums are coming from the compressed artifacts, so in principle they should be unique (even with unique contents, the index.json file should have "subdir": <subdir>, I think?).

The hash that conda-build uses for the build_string doesn't consider the subdir, indeed (and maybe it should).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.

Copy link
Contributor

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great write up @jaimergp !

cep-??.md Outdated

The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below.

The positional syntax also allows the `=` character as a separator, instead of a space. When this is the case, versions are interpreted differently. `pkg=1.8` will be taken as `1.8.*` (fuzzy), but `pkg 1.8` will give `1.8` (exact). To have fuzzy matches with the space syntax, you need to use `pkg =1.8`. This nuance does not apply if a `build` string is present; both `foo==1.0=*` and `foo=1.0=*` are equivalent (they both understand the version as `1.0`, exact).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just reporting the current state of affairs but, jucky.

In rattler, this form is no longer allowed when parsing in strict mode. (still accepted in lenient parsing mode).

Copy link

@AntoinePrv AntoinePrv Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baszalmstra which form is not allowed?
IIRC in mamba pkg 1.8 and pkg =1.8 are the same.

Copy link
Contributor

@baszalmstra baszalmstra Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The form foo=1.0=bla is disallowed! (in strict mode only, used in rattler build)

cep-??.md Outdated

### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.

cep-9999.md Outdated
Comment on lines +76 to +77
6. If `channel` is an exact value and `subdir` is an exact value, `subdir` is appended to
`channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this related to the label channels? e.g. pytorch/label/nightly::libfaiss?
With the seperator logic this will be assumed to be a subdir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in conda is to take the last component and compare it against known subdirs. As a result, channels cannot be named like subdirs. e. g. I can't register a channel named linux-64.

@Hind-M
Copy link
Contributor

Hind-M commented Dec 16, 2024

Not sure about the current status of this CEP, but before moving forward with it, we should maybe consider finalizing this one if we think it could be of interest?

cep-9999.md Outdated

These are also accepted but have reduced utility. Their usage is discouraged:

- `url`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a full URL can be parsed into a MatchSpec object so... should a URL be considered a valid form? In those cases, a note: parsers need to account for %-decoding. See xref conda/conda#14481.

jaimergp and others added 2 commits September 26, 2025 18:42
Co-authored-by: JeanChristopheMorinPerso <JeanChristopheMorinPerso@users.noreply.github.com>
@jaimergp
Copy link
Contributor Author

@baszalmstra, @beckermr, @AntoinePrv, @ruben-arts, @JeanChristopheMorinPerso, I've tackled some of the pending items if you want to take a look. Perhaps more critically, the version strings and ordering conversation is now part of #132.

I think I'll rewrite part of the Specification so we don't lose time with historical details and go straight for the syntax, since it's all intertwined anyway... This is valid 🤦:

>>> str(MS("channel:namespace:pkg 1 2[subdir=linux-63,channel=XX,name=jaime]"))
'XX/linux-63::pkg==1=2'

@jaimergp jaimergp changed the title Add CEP for MatchSpec minilanguage CEP XXXX: MatchSpec minilanguage Sep 26, 2025
@jaimergp
Copy link
Contributor Author

jaimergp commented Feb 16, 2026

Dear @conda/steering-council,

The vote for this CEP has started. It will be open for two weeks, until March 2nd, 2026, 23:59 Anywhere on Earth. This time period has been chosen to make it eligible for time-out rules. As an Enhancement Proposal vote, it requires 60% affirmative votes to pass.

To vote, please mark the relevant checkbox under your username:

Copy link
Contributor

@msarahan msarahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up. As mentioned in comments, I think you bend backwards too much to support legacy stuff. It makes the descriptions more complicated. Just declare what the standard should be, and then separately describe the legacy stuff that is still supported, but should be avoided where possible.

- Ordered comparison, with the implied ordering described in [CEP PR #132](https://github.com/conda/ceps/pull/132):
- Exclusive ordered comparison, expressed as a version literal prefixed by `<` or `>`, MUST be interpreted as "smaller than" and "greater than", respectively, as per their position in the version ordering scheme.
- Inclusive ordered comparison, expressed as a version literal prefixed by one of these strings: `<=`, `>=`, MUST be interpreted as "exclusive ordered comparison", respectively, but they will also match if their position is equivalent in the version ordering scheme.
- Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.
- Semver-like "[compatible release](https://peps.python.org/pep-0440/#compatible-release)" comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.

This could also expand to >=0.5.3,<0.6, but then we get into pre-release goofiness with the 0a0 stuff. I guess the glob also has that problem, though. Or maybe this has been fixed and I'm just old.


### Version expression parsing

In the name of backwards compatibility, the (`name`, `version`, `build`) group in the `MatchSpec` syntax allows two types of separators: spaces and a single `=` character. This conditions how certain `version` expressions are parsed. Given a _version literal_ denoted as `version-literal` (i.e. no operators or asterisks), the following rules MUST apply:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should describe the current standard, and then mention legacy syntax that MUST be supported (or not)

pkg=1.8.*=*
pkg =1.8.* *
pkg ==1.8.* *
pkg[version=1.8.*]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help clarify the syntax descriptions above if you had one of these examples for each syntax element. Expressing things in words is kind of unnatural for these symbolic expressions, which are by definition ways of expressing things more clearly and concisely than words.

Even better might be example matches, like:

<2 would match:

0.9
1.0
1.99
1.99.5a0
2.0a0

but not:
2.0

Perhaps this can be an expandable block to not visually bloat the document.


The new syntax had to maintain backwards compatibility with the space- and `=`-separated forms too. This is the reason behind some surprising behaviors discussed in the specification above.

Advanced expressions like lookaround and backreferences are discouraged because they can incur performance issues leading to DOS and other security problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's ok to use if I'm behind my corporate firewall? Where specifically should they be discouraged? In specs that are shared with others to create environments? Conda env files? I don't understand where the performance issue occurs. If it is within my local host, then that's just conda being slow. Perhaps that is a concern on shared infrastructure. On the other hand, if it affects the repos, then that is a very different story.

- `@feature`: a way to require the, now deprecated, features (e.g. `@mkl`)
- `channel[subdir]::name`: a non ambiguous way to add subdir information to the positional channel field (instead of slash separation). Keyword argument is preferred for disambiguation. By dropping this syntax, we only assign one meaning to square brackets: key-value pairs.

Future work may introduce a stricter syntax subset that further reduces the ambiguity in the specification (e.g. disallowing space-separated name-version-build triplets).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define the strict subset here! Just mention that there are also some legacy things that are still supported, and may be deprecated and removed in the future.

@jaimergp
Copy link
Contributor Author

Thanks @msarahan, I'll apply the editorial changes shortly, but some of that feedback will change the CEP too much at this time. I'll make a note for a future revision of the MatchSpec specification, though.

@jaimergp
Copy link
Contributor Author

@CJ-Wright @chenghlee @marcelotrevisani @mbargull @jakirkham, gentle reminder to cast your vote on this CEP.

@jaimergp
Copy link
Contributor Author

jaimergp commented Mar 3, 2026

The voting period has concluded. This is a CEP vote, so it requires 60% of the steering council to vote to reach quorum, and a 60% of affirmative votes to pass. However, since it was open for more than two weeks and three reminders were sent (actually, four), it also qualifies for timeout rules, where the quorum threshold is simply five people.

  • Size of the steering council: 15
    • Number of votes: 13 -> 86.67%
      • Yes: 13 -> 100%
      • No: 0
      • Abstain: 0
    • Did not vote: 2

This CEP has passed 🎉.

@jaimergp jaimergp changed the title CEP XXXX: MatchSpec minilanguage CEP 29: MatchSpec minilanguage Mar 4, 2026
@jaimergp jaimergp merged commit eb6cd51 into conda:main Mar 4, 2026
1 check passed
@jaimergp jaimergp deleted the matchspec branch March 4, 2026 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CEP: MatchSpec query language CEP request: Document MatchSpec