Conversation
|
I'm seeing myself referring to the "MatchSpec" interface in other CEPs yet this is not standardized, so there we go. Let's open that can of worms. |
|
This will probably need another CEP on |
cep-??.md
Outdated
|
|
||
| ### Exact matches | ||
|
|
||
| To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`. |
There was a problem hiding this comment.
When matching by checksum, should you also add the subdir? If I'm not mistaken, it's possible for two subdirs to contain a package with the same checksum right? Or is this a corner case?
There was a problem hiding this comment.
These checksums are coming from the compressed artifacts, so in principle they should be unique (even with unique contents, the index.json file should have "subdir": <subdir>, I think?).
The hash that conda-build uses for the build_string doesn't consider the subdir, indeed (and maybe it should).
There was a problem hiding this comment.
Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.
baszalmstra
left a comment
There was a problem hiding this comment.
Thanks for the great write up @jaimergp !
cep-??.md
Outdated
|
|
||
| The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below. | ||
|
|
||
| The positional syntax also allows the `=` character as a separator, instead of a space. When this is the case, versions are interpreted differently. `pkg=1.8` will be taken as `1.8.*` (fuzzy), but `pkg 1.8` will give `1.8` (exact). To have fuzzy matches with the space syntax, you need to use `pkg =1.8`. This nuance does not apply if a `build` string is present; both `foo==1.0=*` and `foo=1.0=*` are equivalent (they both understand the version as `1.0`, exact). |
There was a problem hiding this comment.
I know this is just reporting the current state of affairs but, jucky.
In rattler, this form is no longer allowed when parsing in strict mode. (still accepted in lenient parsing mode).
There was a problem hiding this comment.
@baszalmstra which form is not allowed?
IIRC in mamba pkg 1.8 and pkg =1.8 are the same.
There was a problem hiding this comment.
The form foo=1.0=bla is disallowed! (in strict mode only, used in rattler build)
cep-??.md
Outdated
|
|
||
| ### Exact matches | ||
|
|
||
| To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`. |
There was a problem hiding this comment.
Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.
cep-9999.md
Outdated
| 6. If `channel` is an exact value and `subdir` is an exact value, `subdir` is appended to | ||
| `channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets. |
There was a problem hiding this comment.
How does this related to the label channels? e.g. pytorch/label/nightly::libfaiss?
With the seperator logic this will be assumed to be a subdir.
There was a problem hiding this comment.
The logic in conda is to take the last component and compare it against known subdirs. As a result, channels cannot be named like subdirs. e. g. I can't register a channel named linux-64.
|
Not sure about the current status of this CEP, but before moving forward with it, we should maybe consider finalizing this one if we think it could be of interest? |
cep-9999.md
Outdated
|
|
||
| These are also accepted but have reduced utility. Their usage is discouraged: | ||
|
|
||
| - `url` |
There was a problem hiding this comment.
Note that a full URL can be parsed into a MatchSpec object so... should a URL be considered a valid form? In those cases, a note: parsers need to account for %-decoding. See xref conda/conda#14481.
Co-authored-by: Bas Zalmstra <zalmstra.bas@gmail.com>
Co-authored-by: JeanChristopheMorinPerso <JeanChristopheMorinPerso@users.noreply.github.com>
|
@baszalmstra, @beckermr, @AntoinePrv, @ruben-arts, @JeanChristopheMorinPerso, I've tackled some of the pending items if you want to take a look. Perhaps more critically, the version strings and ordering conversation is now part of #132. I think I'll rewrite part of the Specification so we don't lose time with historical details and go straight for the syntax, since it's all intertwined anyway... This is valid 🤦: >>> str(MS("channel:namespace:pkg 1 2[subdir=linux-63,channel=XX,name=jaime]"))
'XX/linux-63::pkg==1=2' |
MatchSpec minilanguage
|
Dear @conda/steering-council, The vote for this CEP has started. It will be open for two weeks, until March 2nd, 2026, 23:59 Anywhere on Earth. This time period has been chosen to make it eligible for time-out rules. As an Enhancement Proposal vote, it requires 60% affirmative votes to pass. To vote, please mark the relevant checkbox under your username:
|
msarahan
left a comment
There was a problem hiding this comment.
Thanks for writing this up. As mentioned in comments, I think you bend backwards too much to support legacy stuff. It makes the descriptions more complicated. Just declare what the standard should be, and then separately describe the legacy stuff that is still supported, but should be avoided where possible.
| - Ordered comparison, with the implied ordering described in [CEP PR #132](https://github.com/conda/ceps/pull/132): | ||
| - Exclusive ordered comparison, expressed as a version literal prefixed by `<` or `>`, MUST be interpreted as "smaller than" and "greater than", respectively, as per their position in the version ordering scheme. | ||
| - Inclusive ordered comparison, expressed as a version literal prefixed by one of these strings: `<=`, `>=`, MUST be interpreted as "exclusive ordered comparison", respectively, but they will also match if their position is equivalent in the version ordering scheme. | ||
| - Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead. |
There was a problem hiding this comment.
| - Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead. | |
| - Semver-like "[compatible release](https://peps.python.org/pep-0440/#compatible-release)" comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead. |
This could also expand to >=0.5.3,<0.6, but then we get into pre-release goofiness with the 0a0 stuff. I guess the glob also has that problem, though. Or maybe this has been fixed and I'm just old.
|
|
||
| ### Version expression parsing | ||
|
|
||
| In the name of backwards compatibility, the (`name`, `version`, `build`) group in the `MatchSpec` syntax allows two types of separators: spaces and a single `=` character. This conditions how certain `version` expressions are parsed. Given a _version literal_ denoted as `version-literal` (i.e. no operators or asterisks), the following rules MUST apply: |
There was a problem hiding this comment.
I think you should describe the current standard, and then mention legacy syntax that MUST be supported (or not)
| pkg=1.8.*=* | ||
| pkg =1.8.* * | ||
| pkg ==1.8.* * | ||
| pkg[version=1.8.*] |
There was a problem hiding this comment.
I think it would help clarify the syntax descriptions above if you had one of these examples for each syntax element. Expressing things in words is kind of unnatural for these symbolic expressions, which are by definition ways of expressing things more clearly and concisely than words.
Even better might be example matches, like:
<2 would match:
0.9
1.0
1.99
1.99.5a0
2.0a0
but not:
2.0
Perhaps this can be an expandable block to not visually bloat the document.
|
|
||
| The new syntax had to maintain backwards compatibility with the space- and `=`-separated forms too. This is the reason behind some surprising behaviors discussed in the specification above. | ||
|
|
||
| Advanced expressions like lookaround and backreferences are discouraged because they can incur performance issues leading to DOS and other security problems. |
There was a problem hiding this comment.
So it's ok to use if I'm behind my corporate firewall? Where specifically should they be discouraged? In specs that are shared with others to create environments? Conda env files? I don't understand where the performance issue occurs. If it is within my local host, then that's just conda being slow. Perhaps that is a concern on shared infrastructure. On the other hand, if it affects the repos, then that is a very different story.
| - `@feature`: a way to require the, now deprecated, features (e.g. `@mkl`) | ||
| - `channel[subdir]::name`: a non ambiguous way to add subdir information to the positional channel field (instead of slash separation). Keyword argument is preferred for disambiguation. By dropping this syntax, we only assign one meaning to square brackets: key-value pairs. | ||
|
|
||
| Future work may introduce a stricter syntax subset that further reduces the ambiguity in the specification (e.g. disallowing space-separated name-version-build triplets). |
There was a problem hiding this comment.
Define the strict subset here! Just mention that there are also some legacy things that are still supported, and may be deprecated and removed in the future.
|
Thanks @msarahan, I'll apply the editorial changes shortly, but some of that feedback will change the CEP too much at this time. I'll make a note for a future revision of the MatchSpec specification, though. |
|
@CJ-Wright @chenghlee @marcelotrevisani @mbargull @jakirkham, gentle reminder to cast your vote on this CEP. |
|
The voting period has concluded. This is a CEP vote, so it requires 60% of the steering council to vote to reach quorum, and a 60% of affirmative votes to pass. However, since it was open for more than two weeks and three reminders were sent (actually, four), it also qualifies for timeout rules, where the quorum threshold is simply five people.
This CEP has passed 🎉. |
Co-authored-by: msarahan <msarahan@users.noreply.github.com>
MatchSpec minilanguageMatchSpec minilanguage
Checklist for submitter
cep-0000.mdnamedcep-XXXX.mdin the root level.Checklist for CEP approvals
${greatest-number-in-main} + 1.cep-XXXX.mdfile has been renamed accordingly.# CEP XXXX -header has been edited accordingly.pre-commitchecks are passing.Closes #80