Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
f392cb6
Add CEP for MatchSpec minilanguage
jaimergp Jun 4, 2024
e2dfdbb
Merge branch 'main' of github.com:conda/ceps into matchspec
jaimergp Sep 26, 2025
5a3c4e0
add namespace
jaimergp Sep 26, 2025
fbb8ddb
add example glob -> regex
jaimergp Sep 26, 2025
6836bc7
Address review suggestions
jaimergp Sep 26, 2025
80455ab
Small detail
jaimergp Sep 26, 2025
7a8e3a2
clarify different syntaxes
jaimergp Sep 26, 2025
8e79b58
Satisfy linter
jaimergp Sep 26, 2025
9afdba3
Rename for linter
jaimergp Sep 26, 2025
4d56cce
Disallow mixing *
jaimergp Sep 26, 2025
76112fb
clarify discouraged keywords
jaimergp Sep 26, 2025
318cf31
link to sections
jaimergp Sep 26, 2025
20e98d5
pre-commit
jaimergp Sep 26, 2025
5480185
Do not mention grammar
jaimergp Sep 26, 2025
bfa2e8c
Add historical notes to Rationale
jaimergp Sep 30, 2025
2ea0739
Revamp matching rules
jaimergp Oct 1, 2025
c2b6fd3
More implementations
jaimergp Oct 1, 2025
9bde2a6
Rewrite syntax
jaimergp Oct 1, 2025
43ce192
Describe string normalization
jaimergp Oct 1, 2025
7dfe8af
Rename
jaimergp Oct 24, 2025
e7582ee
Add Requires
jaimergp Oct 24, 2025
d477a27
Fix Requires
jaimergp Oct 24, 2025
567ebb7
Clarify quoting rules
jaimergp Jan 29, 2026
c7204e0
Clarify role of name globs in matchspec subtypes
jaimergp Jan 29, 2026
79944d0
Clarify version matching nomenclature
jaimergp Jan 29, 2026
45b3a03
add link to Cheng's grammar
jaimergp Jan 29, 2026
cfb8a73
Use "version literal" instead of "version identifier"
jaimergp Jan 29, 2026
dad7e10
Clarify spaces in name, version, build
jaimergp Jan 29, 2026
7e77441
URLs can be MatchSpecs...
jaimergp Jan 29, 2026
87c307f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 29, 2026
08f8743
Clarify target collections for queries
jaimergp Jan 31, 2026
b705ab8
Move "search vs solver" specs to appendices
jaimergp Jan 31, 2026
694b7cf
Add special rules for channel matching
jaimergp Jan 31, 2026
f08aed0
Comment on space separators inside square brackets
jaimergp Jan 31, 2026
67b74ed
Clariy record field
jaimergp Jan 31, 2026
51fe606
Clarify subdirs in :: syntax
jaimergp Jan 31, 2026
ae60f20
Allow spaces around commas
jaimergp Jan 31, 2026
c5c3bb1
String matching is always case insensitive
jaimergp Jan 31, 2026
a935207
Discourage advanced regex matching
jaimergp Jan 31, 2026
1d2ec1f
Clary globs can be multiple
jaimergp Jan 31, 2026
54d249c
Allow parenthese for version clause grouping
jaimergp Jan 31, 2026
3c33e73
Elaborate types of clauses
jaimergp Jan 31, 2026
9800aec
Correct different variations of globbing and literal version matching
jaimergp Jan 31, 2026
0774130
Use blocks instead of paragraphs
jaimergp Jan 31, 2026
5dc6ea5
Move Fully specified expressions to appendix
jaimergp Jan 31, 2026
b1a1ac0
Move canonical representation to appendix
jaimergp Jan 31, 2026
69083bf
pre-commit
jaimergp Jan 31, 2026
8879ed4
Write down exotic syntax that was dropped from the standard.
jaimergp Jan 31, 2026
79e4e2c
add 'test' to backticks
jaimergp Feb 15, 2026
c0fecf4
Minor editorial changes
jaimergp Feb 15, 2026
b81ae1f
Add RFC2119 note
jaimergp Feb 15, 2026
555fa5b
Minor editorial changes
jaimergp Feb 20, 2026
65c5daa
Missing tr
jaimergp Feb 20, 2026
649ae4c
Merge branch 'main' of github.com:conda/ceps into matchspec
jaimergp Mar 4, 2026
c825e25
Mint as 29. Update references and status
jaimergp Mar 4, 2026
ac7a3e9
Clearer wording in some parts of the specification
jaimergp Mar 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ for conda's implementation, all major changes should be submitted as
| [0026](cep-0026.md) | Identifying Packages and Channels in the conda Ecosystem |
| [0027](cep-0027.md) | Standardizing a publish attestation for the conda ecosystem |
| [0028](cep-0028.md) | Customizable system DLL linkage checks for Windows |
| [0029](cep-0029.md) | The `MatchSpec` query language |

## References

Expand Down
254 changes: 254 additions & 0 deletions cep-0029.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# CEP 29 - The `MatchSpec` query language

<table>
<tr><td> Title </td><td> The <code>MatchSpec</code> query language </td></tr>
<tr><td> Status </td><td> Accepted </td></tr>
<tr><td> Author(s) </td><td>
Jaime Rodríguez-Guerra &lt;jaime.rogue@gmail.com&gt;,
Cheng H. Lee &lt;clee@anaconda.com&gt;,
Bas Zalmstra &lt;bas@prefix.dev&gt;
</td></tr>
<tr><td> Created </td><td> June 4, 2024 </td></tr>
<tr><td> Updated </td><td> Mar 4, 2026 </td></tr>
<tr><td> Discussion </td><td>https://github.com/conda/ceps/pull/82</td></tr>
<tr><td> Implementation </td><td>https://github.com/conda/conda/blob/4.3.34/conda/resolve.py#L33, https://github.com/conda/conda/blob/25.7.0/conda/models/match_spec.py#L85, https://docs.rs/rattler_conda_types/latest/rattler_conda_types/struct.MatchSpec.html, https://github.com/mamba-org/mamba/blob/2.3.2/libmamba/src/specs/match_spec.cpp, https://github.com/openSUSE/libsolv/blob/0.7.35/src/conda.c#L567 </td></tr>
<tr><td> Requires </td><td> CEP 33, CEP 34, CEP 36 </td></tr>
</table>

> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as
described in [RFC2119][RFC2119] when, and only when, they appear in all capitals, as shown here.

## Abstract

This CEP standardizes the syntax for the `MatchSpec` query language.

## Motivation

The motivation of this CEP is merely informative. It describes the details of an existing query language.

## Nomenclature

The `MatchSpec` query syntax is a mini-language designed to select individual entries in a collection of package records. It is sometimes referred to as simply _spec_ or _conda spec_.

## Specification

`MatchSpec` strings provide a compact method to query collections of conda artifacts (e.g. in a conda channel, or in an installed environment) by matching `str` and `int` fields on package records (see [CEP 34: `./info/index.json`](./cep-0034.md) and [CEP 36: Package Record Metadata](./cep-0036.md)). Note that fields using other types, like `list[str]` (`depends`, `constrains`, etc.), cannot be matched by this syntax.

### Syntax

The `MatchSpec` syntax can be thought of as a structured collection of _matching expressions_, each targeting a package record field. A matching expression is defined as a string that MUST follow these rules:

- For expressions targeting the `version` field, [version specifier rules](#version-matching) MUST be applied.
- For expressions targeting the `channel` field, [channel specifier rules](#channel-matching) MUST be applied.
- For expressions targeting any other `str` field, [string matching conventions](#string-matching) MUST be used.
- For expressions targeting `int` fields, the target value MUST be converted to `str` and handled as such.

The full `MatchSpec` syntax takes this approximate form, with parentheses denoting optional fields:

```text
(channel(/subdir):(namespace):)name(version(build))([key1='value 1'(, )key2=value2])
```

More precisely, the following rules MUST apply:

- A `MatchSpec` string MAY exhibit two forms of expressions: positional and keyword based.
- Six positional expressions are recognized. From left to right, they can be arranged in two groups: (`channel`, `subdir`, `namespace`) and (`name`, `version`, `build`).
- The first group is optional. If present, it MUST be separated from the second group by a single colon character `:`. Within this group, there are four items:
- `channel: str`. Optional.
- `subdir: str`. Optional. It requires `channel` to be defined. MUST be separated from `channel` by a single forward slash, `/`. It MUST use a known subdir identifier; otherwise it could be interpreted as the last component of a channel URL.
- A colon `:` separator, required if `channel` or `namespace` are defined.
- `namespace: str`. Optional. This expression field MUST be parsed and ignored.
- The second group contains three expressions. They MUST be separated by either spaces or a single `=` character. Separator types MUST NOT be mixed. See the [version expression parsing notes](#version-expression-parsing) for additional details on the interaction between the `=` symbol as a separator and as an operator. Leading and trailing spaces MUST be ignored.
- `name: str`. Required. Empty names MUST be represented as `*`.
- `version: str | VersionSpec`. Optional.
- `build: str`. Optional. It requires `version` to be present.
- All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply:
- Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target record field (key) and the expression string (value) with a single `=` character.
- The value MUST be quoted with single `'` or double `"` quotes if it contains spaces, commas, equal signs, or square brackets. Quoting rules follow [Python's string literals](https://docs.python.org/3/reference/lexical_analysis.html#strings).
- Keyword expression pairs MUST be separated by a single comma character `,`. Historically, spaces have also been allowed as separators but SHOULD NOT be used.
- Spaces between comma separators MAY be allowed and MUST be ignored.
- When both positional and keyword expressions are used, the keyword expressions override the positional values, except for `name`: its keyword expression MUST be ignored.

### Matching conventions

#### String matching

Matching expressions that target string fields MUST be interpreted using these case-insensitive rules:

- If the expression begins with `^` and ends with `$`, it MUST be interpreted as a regular expression (regex). The expression matches if the regex search returns a hit; e.g. with Python: `re.search(expression, field) is not None`. Advanced expressions like lookaround and backreferences SHOULD NOT be allowed.
- If the expression contains one or more asterisks (`*`), it is considered a glob expression and MUST be converted into a regular expression and interpreted as such. To convert a glob expression into a regex string:
1. Escape characters considered special in regex expressions adequately (e.g. using Python's `re.escape`).
2. Replace escaped asterisks (`\*`) by `.*`.
3. Wrap the resulting string with `^` and `$`.
- Otherwise, matches MUST be tested with exact, case-insensitive string equality.

### Channel matching

Channel fields MUST be matched with the same rules as strings.

The value of a channel expression MUST allow both names and full URLs. When a name is used (as per [CEP 26](./cep-0026.md)), it MUST be promoted to its corresponding fully qualified URL before comparison.

#### Version matching

Expressions targeting the `version` field MUST be handled with additional rules. These expressions are referred to as _version specifiers_.

A version specifier MUST consist of one or more _version clauses_, separated by logical operators that MUST follow these rules:

- `|` denotes the logical OR.
- `,` denotes the logical AND.
- `,` (AND) has higher precedence than `|` (OR).
- Parentheses `()` MAY be used to modify precedence.

A _version clause_ consists of either:

- A single version literal (as defined in [CEP 33](./cep-0033.md)).
- An operator plus a single version literal.
- A single version literal containing one or more globs (`*`).
- A single glob (`*`).

> For example, given a string `python>=3,<4`, the version specifier is the full expression `>=3,<4`, which consists of two clauses (`>=3`, `<4`) separated by `,` (AND). Each clause contains a version literal (`3` and `4`, respectively).

Each version clause MUST be described by one of these types:

- [String matching](#string-matching) rules apply when:
- The value is a regex (surrounded by `^` and `$`).
- The value contains a non-trailing glob (`*`).
- Exact equality, expressed as a version literal prefixed by the double-equals string `==`, MUST be interpreted as normalized version literal equality.
- Fuzzy equality, expressed as either a version literal prefixed by one `=` symbol, or a version literal trailed by `.*` or `*`. After removing the leading `=` character and appending a `.*` suffix, comparison is only truthy when all the version segments before the glob match are equal.
- Exclusion, expressed as a version literal or a version literal augmented with globs, prefixed by the string `!=`, MUST be interpreted as a negated fuzzy equality.
- Ordered comparison, with the implied ordering described in [CEP 33](./cep-0033.md):
- Exclusive ordered comparison, expressed as a version literal prefixed by `<` or `>`, MUST be interpreted as "smaller than" and "greater than", respectively, as per their position in the version ordering scheme.
- Inclusive ordered comparison, expressed as a version literal prefixed by one of these strings: `<=`, `>=`, MUST be interpreted as "smaller than" and "greater than", but they will also match as normalized version literal equality.
- Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Semver-like comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.
- Semver-like "[compatible release](https://peps.python.org/pep-0440/#compatible-release)" comparisons, expressed as a version literal prefixed by the `~=` string, MUST be interpreted as greater than or equal to the version literal while also matching a fuzzy equality test for the version literal sans its last segment (e.g. `~=0.5.3` expands to `>=0.5.3,0.5.*`). This operator is considered deprecated, and its expanded alternative SHOULD be used instead.

This could also expand to >=0.5.3,<0.6, but then we get into pre-release goofiness with the 0a0 stuff. I guess the glob also has that problem, though. Or maybe this has been fixed and I'm just old.


Version expressions SHOULD NOT contain spaces between operators, and MUST be removed and ignored if present.

### Version expression parsing

In the name of backwards compatibility, the (`name`, `version`, `build`) group in the `MatchSpec` syntax allows two types of separators: spaces and a single `=` character. This conditions how certain `version` expressions are parsed. Given a _version literal_ denoted as `version-literal` (i.e. no operators or asterisks), the following rules MUST apply:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should describe the current standard, and then mention legacy syntax that MUST be supported (or not)


- If the string only contains two fields, which MUST be `name` and `version`:
- `{name}={version-literal}` and `{name} ={version-literal}` (note the space) both denote fuzzy equality. They are equivalent to `{name}[version={version-literal}.*]` and `{name} {version-literal}.*`
- `{name} {version-literal}` denotes exact equality. It is equivalent to `{name}[version={version-literal}]` and `{name}=={version-literal}`.
- If the string contains three fields, `name`, `version` and `build`:
- `{name} {version-literal} {build}`, `{name} =={version-literal} {build}`, `{name}={version-literal}={build}` and `{name}=={version-literal}={build}` all denote exact equality. They are equivalent to `{name}[version={version-literal},build={build}]`.
- `{name} ={version-literal} {build}` denotes fuzzy equality.

Some examples for `name=pkg` and `version-literal=1.8`, with equivalent version specifiers in the same block:

```text
pkg=1.8
pkg =1.8
pkg 1.8.*
pkg 1.8.* *
pkg=1.8.*
pkg=1.8.*=*
pkg =1.8.* *
pkg ==1.8.* *
pkg[version=1.8.*]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help clarify the syntax descriptions above if you had one of these examples for each syntax element. Expressing things in words is kind of unnatural for these symbolic expressions, which are by definition ways of expressing things more clearly and concisely than words.

Even better might be example matches, like:

<2 would match:

0.9
1.0
1.99
1.99.5a0
2.0a0

but not:
2.0

Perhaps this can be an expandable block to not visually bloat the document.

pkg[version="1.8.*"]
```

```text
pkg 1.8
pkg 1.8 *
pkg==1.8
pkg=1.8=*
pkg==1.8=*
pkg ==1.8 *
pkg[version=1.8]
pkg[version="1.8"]
```

## Examples

```python
>>> str(MatchSpec('foo 1.0 py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('foo=1.0=py27_0'))
'foo==1.0=py27_0'
>>> str(MatchSpec('conda-forge::foo[version=1.0.*]'))
'conda-forge::foo=1.0'
>>> str(MatchSpec('conda-forge/linux-64::foo>=1.0'))
"conda-forge/linux-64::foo[version='>=1.0']"
>>> str(MatchSpec('*/linux-64::foo>=1.0'))
"foo[subdir=linux-64,version='>=1.0']"
```

## Rationale

The initial `MatchSpec` form was a simpler `name [version [build]]` syntax (still in use in build recipes), with two optional keyword arguments (`optional`, `target`) between parentheses. The CLI also had its own string specification, which only supported name and version separated by `=` symbols (see [`conda 4.3.x`'s `spec_from_line()`](https://github.com/conda/conda/blob/b65743878d9d368dc45dc0089a651e72adb10274/conda/cli/common.py#L517-L540)). `conda search` allowed queries based on regexes only.

With `conda 4.4.0`, a new syntax was introduced to unify and consolidate all these different variations (see [release notes for 4.4.0](https://github.com/conda/conda/blob/main/CHANGELOG.md#new-feature-highlights-2), [`conda/conda#4158`](https://github.com/conda/conda/pull/4158), and [`conda/conda#5517`](https://github.com/conda/conda/pull/5517)), and also brought channel and subdir matching (fields before `::`) and arbitrary record field matching in between square brackets.

The new syntax had to maintain backwards compatibility with the space- and `=`-separated forms too. This is the reason behind some surprising behaviors discussed in the specification above.

Advanced expressions like lookaround and backreferences are discouraged because they can incur performance issues leading to DOS and other security problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's ok to use if I'm behind my corporate firewall? Where specifically should they be discouraged? In specs that are shared with others to create environments? Conda env files? I don't understand where the performance issue occurs. If it is within my local host, then that's just conda being slow. Perhaps that is a concern on shared infrastructure. On the other hand, if it affects the repos, then that is a very different story.


Mixing `*` with other version-specific operators is disallowed as per the recommendations discussed in <https://github.com/conda/ceps/pull/60>.

Some legacy syntax that is still recognized by `conda` was intentionally left out of this CEP due to lack of usage in practice. Examples include:

- `[optional]`: bare keyword (no value) that is used internally by the classic solver to track droppable requirements
- `(optional=True)`: same as above, but with different syntax. `conda` allows parenthesized blocks after square brackets, with arbitrary contents.
- `@feature`: a way to require the, now deprecated, features (e.g. `@mkl`)
- `channel[subdir]::name`: a non ambiguous way to add subdir information to the positional channel field (instead of slash separation). Keyword argument is preferred for disambiguation. By dropping this syntax, we only assign one meaning to square brackets: key-value pairs.

Future work may introduce a stricter syntax subset that further reduces the ambiguity in the specification (e.g. disallowing space-separated name-version-build triplets).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define the strict subset here! Just mention that there are also some legacy things that are still supported, and may be deprecated and removed in the future.


## Appendices

### Appendix A: Canonical representation

The canonical string representation of a `MatchSpec` expression proposed by `conda` follows these rules:

1. `name` is required and MUST be written as a positional expression. Empty names MUST be written as `*`.
2. If `version` describes an exact equality expression, it MUST be written as a positional expression, prepended by `==`. If `version` denotes fuzzy equality (e.g. `1.11.*`), it MUST be written as a positional expression with the `.*` suffix left off and prepended by `=`. Otherwise `version` MUST be included inside the key-value brackets.
3. If `version` is an exact equality expression, and `build` does not contain asterisks, `build` MUST be written as a positional expression, prepended by `=`. Otherwise, `build` MUST go inside the key-value brackets.
4. If `channel` is defined and does not contain asterisks, a `::` separator is used between `channel`
and `name`. `channel` MAY be represented by its name or full, subdir-less URL.
5. If both `channel` and `subdir` do not contain asterisks, `subdir` is appended to
`channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets.
6. Key-value pairs MUST be separated by commas, with no spaces between delimiters. Values MUST be quoted with single quotes.
7. The `namespace` field MUST NOT be represented.
8. Case-insensitive string fields MUST be lowercased.

### Appendix B: Search vs solver `MatchSpec`

`MatchSpec` strings can be used under two different contexts:

- Search queries: To obtain all the artifacts matching the query against a collection of packages. Results may include more than one entry per package name.
- Solver requests: To obtain the subset of packages in an index that satisfy the request and their dependency metadata. Results must only include one entry per package name.

In contrast with search queries, only some `MatchSpec` fields make sense for solver requests. Most common include: `name`, `version`, `build`, `channel`.

### Appendix C: Fully specified expressions

To uniquely identify a single package record, a `MatchSpec` expression can be constructed in two ways:

- By passing exact values to the fields `channel` (preferably by URL), `subdir`, `name`, `version`, `build`.
- By matching its checksum directly: `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.

Note that an artifact URL may be parsed into a fully specified `MatchSpec`. Given:

```text
https://conda.anaconda.org/conda-forge/linux-64/python-3.11.10-h123456_0.conda
[----------channel--------------------|-subdir-|-name-|version|-build---]
```

, becomes `conda-forge/linux-64::python==3.11.10[build=h123456_0]`.

## References

- [`conda.models.match_spec.MatchSpec`](https://github.com/conda/conda/blob/24.5.0/conda/models/match_spec.py)
- [`rattler_conda_types::match_spec`](https://github.com/conda/rattler/blob/rattler-v0.37.4/crates/rattler_conda_types/src/match_spec/mod.rs)
- [Package match specifications at conda-build docs](https://docs.conda.io/projects/conda-build/en/latest/resources/package-spec.html#package-match-specifications)
- [Comparison of `MatchSpec` implementation in `conda` vs a LARK grammar](https://github.com/chenghlee/conda-matchspec-grammar)
- [Comparison of `MatchSpec` implementations in `conda`, `rattler` and `mamba`](https://github.com/baszalmstra/cep-matchspec-tests)

## Copyright

All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/).

[RFC2119]: https://datatracker.ietf.org/doc/html/rfc2119