Skip to content

Commit

Permalink
PEP 3131: Fix links (#4263)
Browse files Browse the repository at this point in the history
* PEP 3131: Replace 404 with Internet Archive link

* PEP 3131: Use https

* PEP 3131: Move references inline
  • Loading branch information
hugovk authored Feb 11, 2025
1 parent 2b95453 commit 1c2770d
Showing 1 changed file with 19 additions and 23 deletions.
42 changes: 19 additions & 23 deletions peps/pep-3131.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ an additional policy is necessary, anyway.
Specification of Language Changes
=================================

The syntax of identifiers in Python will be based on the Unicode standard annex
UAX-31 [1]_, with elaboration and changes as defined below.
The syntax of identifiers in Python will be based on the `Unicode standard annex
UAX-31 <https://www.unicode.org/reports/tr31/>`__, with elaboration and changes
as defined below.

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
are the same as in Python 2.5. This specification only introduces additional
Expand All @@ -69,9 +70,10 @@ the ``unicodedata`` module.
The identifier syntax is ``<XID_Start> <XID_Continue>*``.

The exact specification of what characters have the XID_Start or
XID_Continue properties can be found in the DerivedCoreProperties
file of the Unicode data in use by Python (4.1 at the time this
PEP was written), see [6]_. For reference, the construction rules
XID_Continue properties can be found in the `DerivedCoreProperties
file <https://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt>`__
of the Unicode data in use by Python (4.1 at the time this
PEP was written). For reference, the construction rules
for these sets are given below. The XID_* properties are derived
from ID_Start/ID_Continue, which are derived themselves.

Expand All @@ -94,7 +96,7 @@ comparison of identifiers is based on NFKC.

A non-normative HTML file listing all valid identifier characters for
Unicode 4.1 can be found at
http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.
https://web.archive.org/web/20081016132748/http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.

Policy Specification
====================
Expand Down Expand Up @@ -136,8 +138,9 @@ The following changes will need to be made to the parser:
Open Issues
===========

John Nagle suggested consideration of Unicode Technical Standard #39,
[2]_, which discusses security mechanisms for Unicode identifiers.
John Nagle suggested consideration of `Unicode Technical Standard #39
<https://www.unicode.org/reports/tr39/>`__,
which discusses security mechanisms for Unicode identifiers.
It's not clear how that can precisely apply to this PEP; possible
consequences are

Expand All @@ -153,7 +156,8 @@ needs two identifiers to compare them for confusion - is it possible
to somehow apply it to a single identifier only, and warn?

In follow-up discussion, it turns out that John Nagle actually
meant to suggest UTR#36, level "Highly Restrictive", [3]_.
meant to suggest `UTR#36 <https://www.unicode.org/reports/tr36/>`__,
level "Highly Restrictive".

Several people suggested to allow and ignore formatting control
characters (general category Cf), as is done in Java, JavaScript, and
Expand All @@ -164,15 +168,17 @@ later.
Some people would like to see an option on selecting support
for this PEP at run-time; opinions vary on what precisely
that option should be, and what precisely its default value
should be. Guido van Rossum commented in [5]_ that a global
flag passed to the interpreter is not acceptable, as it would
should be. `Guido van Rossum commented
<https://mail.python.org/pipermail/python-3000/2007-May/007925.html>`__
that a global flag passed to the interpreter is not acceptable, as it would
apply to all modules.

Discussion
==========

Ka-Ping Yee summarizes discussion and further objection
in [4]_ as such:
`Ka-Ping Yee summarizes discussion and further objection
<https://mail.python.org/pipermail/python-3000/2007-June/008161.html>`__
as such:

A. Should identifiers be allowed to contain any Unicode letter?

Expand Down Expand Up @@ -250,16 +256,6 @@ F. Which normalization form should be used, NFC or NFKC?
G. Should source code be required to be in normalized form?


References
==========

.. [1] http://www.unicode.org/reports/tr31/
.. [2] http://www.unicode.org/reports/tr39/
.. [3] http://www.unicode.org/reports/tr36/
.. [4] https://mail.python.org/pipermail/python-3000/2007-June/008161.html
.. [5] https://mail.python.org/pipermail/python-3000/2007-May/007925.html
.. [6] http://www.unicode.org/Public/4.1.0/ucd/DerivedCoreProperties.txt
Copyright
=========

Expand Down

0 comments on commit 1c2770d

Please sign in to comment.