diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 8d6ba17442f..8a7400e5520 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -596,6 +596,7 @@ peps/pep-0712.rst @ericvsmith peps/pep-0713.rst @ambv peps/pep-0714.rst @dstufft peps/pep-0715.rst @dstufft +peps/pep-0716.rst @dstufft peps/pep-0718.rst @gvanrossum peps/pep-0719.rst @Yhg1s peps/pep-0720.rst @FFY00 diff --git a/pep-0716.rst b/pep-0716.rst new file mode 100644 index 00000000000..e21b8d21cbd --- /dev/null +++ b/pep-0716.rst @@ -0,0 +1,319 @@ +PEP: 716 +Title: Normalization of Project Names in Metadata and Filenames +Author: Donald Stufft +PEP-Delegate: Paul Moore +Discussions-To: +Status: Draft +Type: Standards Track +Topic: Packaging +Content-Type: text/x-rst +Created: 11-Jun-2023 +Post-History: + + +Abstract +======== + +This PEP standardizes on where and when project names should and should not be +normalized in the packaging toolchain. + + +Motivation +========== + +Historically there was effectively little to no requirements on the valid values +of names in the packaging ecosystem. Projects that wanted to interpret those +names had to cope with a wide range of values, and had to each implement their +own normalization schemes to try and detect names that were the same, but +"spelled" differently. + +Over the intervening years, various PEPs have ratcheted down various pieces of +metadata such as version numbers (:pep:`440`), filenames in bdists (:pep:`427`), +names in the Simple API (:pep:`503`), and filenames in sdists (:pep:`625`). + +Unfortunately, a complex interaction between these various standards *and* +changes made to the specifications without an associated PEP, have created a +situation where the ecosystem is in an inconsistent and broken state with +regards to normalization of names. + +The brokenness is currently around ``.``, but the underlying issue actually +affects any unnormalized name that is being emitted. + +The path to getting to where we are today was roughly: + +1. :pep:`427` was accepted with two different requirements on what was a valid + filename. One requirement, specified in prose which if read strictly did not + actually make sense, and another requirement implemented in code that did. + + Rather than normalization, :pep:`427` focused on what was *valid* and + provided a way to escape characters that were not valid in the filename but + were valid in the "source" metadata (Name and Version). + + All tools at this time, implemented themselves using the second of those + requirements, and escaped as expected. +2. :pep:`440` was accepted, which put strict requirements on what was a valid + version and defined a normalization procedure for valid but differently + specified versions. + + This normalization used ``-``, which was an invalid value for :pep:`427` and + required escaping to ``_``, so :pep:`440` was extended to allow that as an + optional spelling of ``_``, which would normalize to ``-``. +3. :pep:`503` was accepted, which specified a normalization of the project name + *when* querying the Simple API for a project. +4. The spec for ``.dist-info`` required normalization of the name (using the + :pep:`503` rules) but did not specify a requirement on version. The :pep:`503` + normalization uses ``-``, but tools that locate ``.dist-info`` use the ``-`` + character to split between name and version, so in practice nobody was + following this requirement. + + Thus, the ``.dist-info`` spec was `updated `__, + without a PEP, to make the spec more closely align with common practice. The + result of that being that the spec states that name must be normalized as + per :pep:`503` and versions must be normalized as per :pep:`440`, but + escaping ``-`` with ``_``. + + This change recognized that there are many existing ``.dist-info`` directories + that are not normalized, and thus instructs tools to expect ``.dist-info`` + directories with unnormalized values, but that all tools must write normalized + values going forward. +5. It was `noted `__ + that :pep:`427` required that the segments of the filename contain only + alphanumeric characters, ``_``, and ``.``, and that all other characters must + be escaped with ``_``. However, :pep:`440` allows the use of ``!`` and ``+``, + which meant that those characters got escaped to ``_``, which could then not + be parsed back into their original versions. +6. As a result of that discussion, the Wheel specs were `updated `__ + and then `updated again `__, + without a PEP, to require that versions were normalized using :pep:`440` and + then ``-`` was escaped with ``_``. + + That change also required that runs of ``-_.`` should be replaced with ``_`` + as well as lowercasing everything. It noted that it was equivalent to :pep:`503` + normalization followed by replacing ``-`` with ``_``. + + It was `noted 9 months later `__, that + there wasn't much discussion on the change to name normalization, but that it + landed anyways. +7. :pep:`621` was accepted, which provided a way to specify project metadata in + ``pyproject.toml``. This PEP was careful to make the distinction between + static metadata, where tools could trust the values in ``pyproject.toml`` and + dynamic metadata where they could not. + + However, this PEP doesn't clarify whether static values must be identical + values or equivalent values. To make matters worse, it includes the statement + that tools should normalize the name, using :pep:`503` rules, as soon as it + is read for internal consistency. +8. :pep:`625` was accepted, standardizing on a format for filenames for sdists. + This PEP requires that the project names are normalized "as described in the + wheel spec", which at the time meant full :pep:`503` normalization, and + versions normalized as per :pep:`440`. + + +Independently to all of the above, and prior to (4), PyPI had implemented a +check that ensured that the filename being uploaded matched the current project +name. This check did not correctly take into account normalization, but did take +into account filename escaping. It also implements renames by allowing projects +to rename themselves by changing their project name in their metadata. + +The effect of all of the above, is that we're now in a situation where: + +* Some tools will normalize the filename before writing them, either to the + filesystem or to PyPI. +* Some tools will normalize the project name before emitting them to either + ``METADATA`` or to PyPI. +* Some tools (PyPI) require that the filename and the project name match, without + taking normalization into account. +* Some tools (Artifactory) require that the filenames are not normalized. +* The above sets of tools do not perfectly overlap in any direction. + +We've essentially created a mess where nobody is emitting filenames in quite the +same way and the normalization rules, first defined in :pep:`503` are being used +in contexts where it is not appropriate to do so. + + +Rationale +========= + +This PEP follows two guiding principals: + +1. Names are provided by people and should be used as is where possible. The + name of the project, and how it appears, is a fundamental property of the + project. +2. When interpreting names, tooling should normalize values as much as + possible to reduce confusion. + +This follows the original intent behind the normalization in :pep:`503`, which +was designed to be a normalization applied when two computers spoke to each +other, not as something that would "leak" out into the human-facing areas. + + +Specification +============= + +The project name that is specified by an author ends up flowing through several +parts of the ecosystem, and each part needs to be considered on its own to determine what +kind of name (normalized or not) makes sense in that part. + +In general, we follow the guiding principals, use the unnormalized name as +provided by the author wherever possible, and normalize strictly where not. + +In some cases, we are simply repeating the status quo, this is done to provide +clarification and to be explicit which uses were considered as part of this +PEP. + + +Core Metadata +------------- + +The ``Name`` field **MUST NOT** be normalized when emitting into ``METADATA`` +or ``PKG-INFO``. + +The ``Name`` field **MUST NOT** be normalized when uploading to a repository. + +The ``Name`` field **SHOULD NOT** be normalized when being presented for display +to a user. + +The ``Name`` field **MUST** be normalized during comparison. + +Tools that read the ``Name`` field from a core metadata file **MUST** be prepared +to accept unnormalized names. + + +pyproject.toml +-------------- + +The ``project.name`` key **MUST** be preserved exactly as the author chose to +represent it, and **MUST** be emitted in this way into ``METADATA`` or +``PKG-INFO``. + +The ``project.name`` field **MUST** be normalized during comparison. + + +.dist-info directories +---------------------- + +The directory name follows the pattern of ``{name}-{version}.dist-info``. + +The ``name`` field **MUST** be normalized, with any resulting ``-`` escaped to ``_``. + +Tools that read an arbitrary ``.dist-info`` directory **MUST** be prepared to +accept unnormalized values, however tools that work only on *new* ``.dist-info`` +directories **SHOULD** validate that all values are normalized. + + +Source and Binary Distributions +------------------------------- + +Both the sdist and bdist specifications incorporate the project name in their +filenames (``{name}-{version}.tar.gz`` and +``{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl`` +respectively). + +The ``name`` field **MUST** be non-normalized, with the exception that any ``-`` +**MUST** be escaped to be ``_``. + +Tools that accept an arbitrary distribution **MUST** be prepared to accept both +non-normalized and normalized filenames. However, tools that only work on *new* +distributions **SHOULD** validate that the distribution filenames are not +normalizing ``name``. + + +Simple Repository API +--------------------- + +The project name, when returned in the "index" URL (e.g. ``/simple/``) +**MUST** be non-normalized. + +The project name when used in the URL (e.g. ``/simple/$project/``) **MUST** be +normalized. + +The project name, when used on the Project detail page +(e.g. ``/simple/$project/``), **MUST** be non-normalized. + +Tools that read values for filenames and names from the Simple Repository API +**MUST** be prepared to handle both normalized and non-normalized names. + + +Backwards Compatibility +======================= + +This PEP breaks compatibility in a few ways: + +* Tools that are currently emitting filenames where ``name`` has been normalized + in accordance with the current spec are immediately no longer compliant and + must be updated to emit non-normalized names. + + * This is mitigated by the fact that all tools are required to continue to + accept both normalized and non-normalized filenames unless they *know* that + they only work on *new* distributions (PyPI uploads, ``pyproject-build``, etc). + +* Tools that emit normalized names into ``METADATA``, ``PKG-INFO``, or when + uploading to a repository are immediately no longer compliant and must be + updated to emit non-normalized names. + + * It's unclear in the current spec whether names were intended to be normalized + in this case or not, but the practice of normalization here has caused a + number of people to be confused why their names are different from what + they've entered. + +* Tools that are currently emitting the names in the simple API (outside of the URL + itself) as normalized, which is either allowed or required by the spec + currently are immediately not longer complaint and must be updated to emit + non-normalized names. + + * Like for filenames, this is mitigated by the fact that all tools are required + to continue to accept both normalized and non-normalized values. + + +Tools that validate *new* values should ideally start warning on now-invalid +options for some period of time, before starting to hard fail when encountering +them. + + +Rejected Ideas +============== + +Require Normalization Everywhere +-------------------------------- + +One other possible idea is to simply require normalization everywhere, however +this PEP rejects that. + +The primary reason we reject it is that the name of a project is not an internal +identifier, but is central to that project's identity. Projects often have +strong opinions on the way that their project's name should look, and +normalization removes that from them. + +There are situations where we need a normalized value, so this PEP does use +them, but attempts to use them sparingly, only when they're actually required. +It treats normalization as something that is done when software is talking to +software about a project, and not when humans are talking about it. + + +Require Normalization in Filenames +---------------------------------- + +Filenames sit in a weird place, in most cases they are produced by software +and are consumed by software, so in theory it should be fine to normalize them +which has some nice properties. + +However, this PEP rejects doing that. + +Although they are often a software-to-software identifier, they are also used by +humans when sharing and manually downloading the software. They appear in places +like the PyPI UI, GitHub Releases, downstream Linux repositories, etc. In some +cases the only incanation of the project's name someone might see is the name +embedded into the filename. + +Further, historically filenames were not normalized, and a change to the spec +that did not go through the PEP process is what required it. However, prior to +that change, people have created systems that rely on encoding information into +the project name, such as namespaces using the ``.`` character, which a +requirement to normalize would break. + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive.