From 32f0414a69d89826090234bf56e922c987ae7053 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Wed, 16 Jul 2025 12:13:06 -0700 Subject: [PATCH 1/9] RFC: ID_Compat_Math characters allowed in identifiers --- text/0000-compat-math-identifiers.md | 143 +++++++++++++++++++++++++++ 1 file changed, 143 insertions(+) create mode 100644 text/0000-compat-math-identifiers.md diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md new file mode 100644 index 00000000000..524b6675a57 --- /dev/null +++ b/text/0000-compat-math-identifiers.md @@ -0,0 +1,143 @@ +- Feature Name: `compat_math_identifiers` +- Start Date: 2025-07-16 +- RFC PR: [TODO rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [TODO rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Rust already supports a wide range of unicode characters in identifiers - for example `α`, `номер`, `عدد`, `数`, `संख्या` are all valid Rust identifiers. +This feature extends the set of Unicode character which can be used in identifiers with [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype), most notable: `∇`, `∂`, `∞`, subscripts `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` and superscripts `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎`. +This can be a boon to implementers of scientific concepts as they can write for example `let ∇E₁₂ = 0.5;`. + +# Motivation +[motivation]: #motivation + +Programming languages have historically focused on the quite narrow set of ASCII characters, however developers from other cultures or specialized problem spaces can benefit from using characters which are native to their culture or domain. +The vast body of scientific literature uses a variety of characters to express concepts from physics, mathematics, biology, robotics and many others. +Symbols typically appearing in equations are Roman letters like `x`, Greek letters like `γ`, differentiation operators like `∂` (partial derivative) and `∇` (gradient). +Variables like `x` are often adorned with subscripts like `x₁₂` and regularly also with superscripts like `γ⁺` or `x⁽²⁾`. +Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusing and implementation errors. + +For example instead of: +``` +let gradient_energy_1 = 2.0 * (position_1 - center_1); +let gradient_energy_2 = 2.0 * (position_2 - center_2); +``` +one could write: +``` +let ∇E₁ = 2.0 * (p₁ - c₁); +let ∇E₂ = 2.0 * (p₂ - c₂); +``` + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +If needed you can use mathematical symbols like `α`, `∇`, or `∂` as part of an identifier when implementing scientific concepts. + +In addition you can use subscript and superscripts for your identifiers, for example you can write `a₁₂` instead of `a_12`, or `a⁺` instead of `a_plus`. +Note that you cannot start an identifier with a subscript or superscript, for example `₁a` will give a compiler error. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The Unicode sets [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype) consist of the following characters: + +1) `∂∇` +2) `∞` +3) `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎` +4) `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` +5) `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` (italic and bold versions of `∂∇`) + +The characters 1) - 4) are added to the set of Rust identifiers. + +The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC warning when used: +``` +warning: identifier contains a non normalized (NFKC) character: '𝛁' +``` +similarly to how characters like `𝑥` (instead of `x`) or `𝑓` (instead of `f`) are triggering this warning today. + +Note that if breaking precedence is desired I would suggest to not add the characters from 5). + +# Drawbacks +[drawbacks]: #drawbacks + +* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedence in Rust seems to be to add them alongside their base version but trigger the NCKC warning. + +* Some developers prefer to only use ASCII characters for programming. This paradigm can be enforced today via `deny(non_ascii_idents)`. This would disallow all characters added by this RFC. + +* The superscript characters can be confused with actual mathematical operations. For example someone might write `let a = 2.0; let b = 3.0 * a²;` and be confused that this will result in a compiler error. There might also be the potential for subtle bugs like `let a² = 2; let a = 2; let x = a²;` and erronously assuming that `x = 4`, however one can argue that this is not due to the superscript characters as it can happen as well when only using ASCII characters: `let a_sq = 2; let a = 2; let x = a_sq;`. + +* Some people might find it difficult to read superscript and subscript letters on lower resolution screens or when using small font sizes. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +If this RFC is not implemented then everyone has to keep using ASCII characters like `gradient_energy` or `a01`. + +The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts. + +# Prior art +[prior-art]: #prior-art + +Rust has the philosophy to be open to various cultures and languages and allow them use their native symbols as identifiers. +Example which compiles in stable Rust without warning: +``` +fn main() { + let λ = 2.718_f32; // Greek letter lambda + let 파이 = 3.141_f32; // Greek letter pi + let значення = λ.abs() + 파이.abs(); + println!("{значення}"); +} +``` + +Many of these characters are easily confusable if not attuned to the corresponding language or culture. +Example which compiles in stable Rust without warning: +``` +fn main() { + let 鳯 = 3; // U+9CF5: "phoenix" (old variant) + let 鳳 = 4; // U+9CF4: modern simplified/traditional + let 隱 = 5; // U+96B1: “hidden” + let 隠 = 6; // U+96B0: nearly identical glyph + println!("鳯 = {}, 鳳 = {}", 鳯, 鳳); + println!("隱 = {}, 隠 = {}", 隱, 隠); +} + +``` + +A vast set of characters added as part of Unicode character sets are easily confusable with other characters. +Example which compiles in stable Rust and triggers a "warning: identifier contains a non normalized (NFKC) character": +``` +fn main() { + let l = 1.0; + let ℓ = l + 2.0; + let 𝑓𝑢𝑛𝑐 = |𝑥: f32| 𝑥 * ℓ + l; + let Σ = (1..5).map(|𝑖| 𝑓𝑢𝑛𝑐(𝑖 as f32)).sum::(); + println!("∑: {Σ}"); +} +``` + +There are characters which one might argue should never have been added but are part of allowed Unicode sets. +Example which compiles in stable Rust and triggers a "warning: identifier contains an uncommon character": +``` +fn ᅟ() { // U+115F (Hangul Choseong Filler) renders as blank + println!("boo"); +} + +fn main() { + ᅟ(); // dito +} +``` + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- Are there other character sets which could be added as part of this RFC? +- Should the italic and bold versions of characters `∂∇` be added? + +# Future possibilities +[future-possibilities]: #future-possibilities + +Rust has chosen the path of allowing non-ASCII characters as identifiers and this RFC adds some more characters which are useful to the scientific domain. + +There might be other usefule sets of characters which could be added in the future. From 60dabcaf0dac06fc386fa1486f1bf17c4e96e243 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Wed, 16 Jul 2025 16:19:30 -0700 Subject: [PATCH 2/9] Links to Unicode and various improvements * Added links to UAX31 and others as requested in CR * Fixed typos as requested in CR * Extended the drawbacks section * Other improvements --- text/0000-compat-math-identifiers.md | 56 ++++++++++++++++------------ 1 file changed, 33 insertions(+), 23 deletions(-) diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md index 524b6675a57..7dfbbf9b50c 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/0000-compat-math-identifiers.md @@ -7,7 +7,7 @@ [summary]: #summary Rust already supports a wide range of unicode characters in identifiers - for example `α`, `номер`, `عدد`, `数`, `संख्या` are all valid Rust identifiers. -This feature extends the set of Unicode character which can be used in identifiers with [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype), most notable: `∇`, `∂`, `∞`, subscripts `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` and superscripts `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎`. +This RFC extends the set of Unicode character which can be used in identifiers with [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype), most notable: `∇`, `∂`, `∞`, subscripts `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` and superscripts `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎`. This can be a boon to implementers of scientific concepts as they can write for example `let ∇E₁₂ = 0.5;`. # Motivation @@ -15,8 +15,8 @@ This can be a boon to implementers of scientific concepts as they can write for Programming languages have historically focused on the quite narrow set of ASCII characters, however developers from other cultures or specialized problem spaces can benefit from using characters which are native to their culture or domain. The vast body of scientific literature uses a variety of characters to express concepts from physics, mathematics, biology, robotics and many others. -Symbols typically appearing in equations are Roman letters like `x`, Greek letters like `γ`, differentiation operators like `∂` (partial derivative) and `∇` (gradient). -Variables like `x` are often adorned with subscripts like `x₁₂` and regularly also with superscripts like `γ⁺` or `x⁽²⁾`. +Symbols often appearing in equations are Roman letters like `x`, Greek letters like `θ`, and differentiation operators like `∂` and `∇`. +Variables are often adorned with subscripts like `x₁₂` or superscripts like `x⁺` or `x⁽²⁾`. Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusing and implementation errors. For example instead of: @@ -33,47 +33,57 @@ let ∇E₂ = 2.0 * (p₂ - c₂); # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -If needed you can use mathematical symbols like `α`, `∇`, or `∂` as part of an identifier when implementing scientific concepts. +If needed you can use mathematical symbols like `θ`, `∇`, or `∂` as part of an identifier when implementing scientific concepts. -In addition you can use subscript and superscripts for your identifiers, for example you can write `a₁₂` instead of `a_12`, or `a⁺` instead of `a_plus`. -Note that you cannot start an identifier with a subscript or superscript, for example `₁a` will give a compiler error. +In addition you can use subscript and superscripts for your identifiers, for example you can write `x₁₂` instead of `x_12`, or `x⁺` instead of `x_plus`. +Note that you cannot start an identifier with a subscript or superscript, for example `₁x` will give a compiler error. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -The Unicode sets [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype) consist of the following characters: +The Unicode sets [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Start%3DYes%3A%5D&g=&i=idtype) and [`ID_Compat_Math_Continue`](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AID_Compat_Math_Continue%3DYes%3A%5D&g=&i=idtype) as defined in [Unicode Standard Annex #31 (UAX31)](https://www.unicode.org/reports/tr31/#Standard_Profiles) are part of the Unicode mathematical compatibility notation profile and consist of the following characters: -1) `∂∇` -2) `∞` -3) `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎` -4) `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` -5) `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` (italic and bold versions of `∂∇`) +1) `∂` and `∇` from [Miscellaneious mathematical symbols](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Miscellaneous%20mathematical%20symbols%7D), +2) `∞` from [Miscellaneous mathematical symbol](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Miscellaneous%20mathematical%20symbol%7D), +3) `₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎` from [Subscripts](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Subscripts%7D), +4) `⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾` from [Superscripts](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Superscripts%7D) and [Latin-1 punctuation and symbols](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Latin-1%20punctuation%20and%20symbols%7D), +5) `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` (italic and bold versions of `∂` and `∇`) from various sets like [Bold Greek symbols](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BNames_List_Subheader=Bold%20Greek%20symbols%7D). -The characters 1) - 4) are added to the set of Rust identifiers. +The characters 1) - 4) are added to the set of characters allowed in Rust identifiers. +[UAX31](https://www.unicode.org/reports/tr31/#Standard_Profiles) notes that "supporting these characters is recommended for some computer languages because they can be beneficial in some applications". +These characters will not have syntactic use and are only added to the set of characters allowed in identifiers following the recommendations of [UAX31-R3b](https://www.unicode.org/reports/tr31/#R3b). +For example `let a = 2.0; let b = a²;` will naturally give a compiler error that `a²` is an unknown identifier and not be interpreted as `let b = a * a;`. +Similarly `let a = [2, 0]; let b = a₁;` will naturally give a compiler error that `a₁` is an unknown identifier and not be interpreted as `let b = a[0];`. The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC warning when used: ``` warning: identifier contains a non normalized (NFKC) character: '𝛁' ``` -similarly to how characters like `𝑥` (instead of `x`) or `𝑓` (instead of `f`) are triggering this warning today. +similarly to how characters like `𝑥` (instead of `x`) or `𝑓` (instead of `f`) are triggering this warning in stable Rust today. +This follows the guidelines from the [Unicode Technical Standard #55 - Source Code Handling (UTS55)](https://www.unicode.org/reports/tr55/#General-Security-Profile) which recommends that "implementations should provide a mechanism to warn about identifiers that are not in the General Security Profile for Identifiers" as defined in the [Unicode Technical Standard #39 - Unicode Security Mechanisms (UTS39)](https://www.unicode.org/reports/tr39/#General_Security_Profile). +In particular the characters in 5) are identified as "Not_NFKC", i.e. characters that cannot occur in strings normalized to [NFKC](https://unicode.org/reports/tr15/#Norm_Forms). -Note that if breaking precedence is desired I would suggest to not add the characters from 5). +It shall be pointed out that Unicode specifically [mentions Rust as a positive industry example](https://www.unicode.org/reports/tr55/#General-Security-Profile) following the recommendations from the General Security Profile # Drawbacks [drawbacks]: #drawbacks -* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedence in Rust seems to be to add them alongside their base version but trigger the NCKC warning. +* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedence in Rust seems to be to add them alongside their base version but trigger the NFKC warning. * Some developers prefer to only use ASCII characters for programming. This paradigm can be enforced today via `deny(non_ascii_idents)`. This would disallow all characters added by this RFC. -* The superscript characters can be confused with actual mathematical operations. For example someone might write `let a = 2.0; let b = 3.0 * a²;` and be confused that this will result in a compiler error. There might also be the potential for subtle bugs like `let a² = 2; let a = 2; let x = a²;` and erronously assuming that `x = 4`, however one can argue that this is not due to the superscript characters as it can happen as well when only using ASCII characters: `let a_sq = 2; let a = 2; let x = a_sq;`. +* The superscript characters could be confused with actual mathematical operations. For example someone might write `let a = 2.0; let b = 3.0 * a²;` and be confused that this will result in a compiler error. There might also be the potential for subtle bugs like `let a² = 2; let a = 2; let x = a²;` and erronously assuming that `x = 4`, however one can argue that this is not due to the superscript characters as it can happen as well when only using ASCII characters: `let a_sq = 2; let a = 2; let x = a_sq;`. + +* The subscript characters could be confused with indexing operation. For example someone might write `let a = [2, 0]; let b = a₁;` and be confused that this will result in a compiler error. * Some people might find it difficult to read superscript and subscript letters on lower resolution screens or when using small font sizes. +* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -If this RFC is not implemented then everyone has to keep using ASCII characters like `gradient_energy` or `a01`. +If this RFC is not implemented then everyone has to keep using ASCII characters for identifier in scientific code, for example `gradient_energy` or `a_12`. The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts. @@ -84,9 +94,9 @@ Rust has the philosophy to be open to various cultures and languages and allow t Example which compiles in stable Rust without warning: ``` fn main() { - let λ = 2.718_f32; // Greek letter lambda - let 파이 = 3.141_f32; // Greek letter pi - let значення = λ.abs() + 파이.abs(); + let λ = 2.718_f32; // Greek letter lambda + let 파이 = 3.141_f32; // Korean word for "pie" + let значення = λ.abs() + 파이.abs(); // Cyrillic println!("{значення}"); } ``` @@ -133,11 +143,11 @@ fn main() { [unresolved-questions]: #unresolved-questions - Are there other character sets which could be added as part of this RFC? -- Should the italic and bold versions of characters `∂∇` be added? +- Should the italic and bold versions of characters `∂` and `∇` be added? # Future possibilities [future-possibilities]: #future-possibilities Rust has chosen the path of allowing non-ASCII characters as identifiers and this RFC adds some more characters which are useful to the scientific domain. -There might be other usefule sets of characters which could be added in the future. +There might be other useful sets of characters which could be added in the future. From b52ddabda016e31986100a1d7265fe9bebc63afd Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Wed, 16 Jul 2025 17:24:04 -0700 Subject: [PATCH 3/9] Improved grammar of a sentence --- text/0000-compat-math-identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md index 7dfbbf9b50c..6222d3221f8 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/0000-compat-math-identifiers.md @@ -63,7 +63,7 @@ similarly to how characters like `𝑥` (instead of `x`) or `𝑓` (instead of ` This follows the guidelines from the [Unicode Technical Standard #55 - Source Code Handling (UTS55)](https://www.unicode.org/reports/tr55/#General-Security-Profile) which recommends that "implementations should provide a mechanism to warn about identifiers that are not in the General Security Profile for Identifiers" as defined in the [Unicode Technical Standard #39 - Unicode Security Mechanisms (UTS39)](https://www.unicode.org/reports/tr39/#General_Security_Profile). In particular the characters in 5) are identified as "Not_NFKC", i.e. characters that cannot occur in strings normalized to [NFKC](https://unicode.org/reports/tr15/#Norm_Forms). -It shall be pointed out that Unicode specifically [mentions Rust as a positive industry example](https://www.unicode.org/reports/tr55/#General-Security-Profile) following the recommendations from the General Security Profile +Note that Unicode specifically [mentions Rust as a positive industry example](https://www.unicode.org/reports/tr55/#General-Security-Profile) that follows the recommendations from the General Security Profile. # Drawbacks [drawbacks]: #drawbacks From fe3d007b7476388a7c66343b7e6157043a50a481 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Wed, 16 Jul 2025 18:00:31 -0700 Subject: [PATCH 4/9] Added another, longer motivating example --- text/0000-compat-math-identifiers.md | 34 +++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md index 6222d3221f8..5550c92338a 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/0000-compat-math-identifiers.md @@ -30,6 +30,39 @@ let ∇E₁ = 2.0 * (p₁ - c₁); let ∇E₂ = 2.0 * (p₂ - c₂); ``` +A longer example from the "wilds": +``` +fn strain_energy_hessian_coeffs(l0: f64, l: f64) -> [f64; 4] { + let l02 = l0.powi(2); + let l03 = l0 * l02; + let l04 = l0 * l03; + let l05 = l0 * l04; + let l2 = l.powi(2); + let l3 = l * l2; + + let h = (l02 - l2) / (2.0 * l03); + let dh = (3.0 * l2 - l02) / (2.0 * l05); + + [1.0 / l3, -1.0 / l03, dh, 1.0 / l0 - 1.0 / l + h] +} +``` +With this RFC one could write: +``` +fn strain_energy_hessian_coeffs(l₀: f64, l: f64) -> [f64; 4] { + let l₀² = l₀.powi(2); + let l₀³ = l₀ * l₀²; + let l₀⁴ = l₀ * l₀³; + let l₀⁵ = l₀ * l₀⁴; + let l² = l.powi(2); + let l³ = l * l²; + + let h = (l₀² - l²) / (2.0 * l₀³); + let dh = (3.0 * l² - l₀²) / (2.0 * l₀⁵); + + [1.0 / l³, -1.0 / l₀³, dh, 1.0 / l₀ - 1.0 / l + h] +} +``` + # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -112,7 +145,6 @@ fn main() { println!("鳯 = {}, 鳳 = {}", 鳯, 鳳); println!("隱 = {}, 隠 = {}", 隱, 隠); } - ``` A vast set of characters added as part of Unicode character sets are easily confusable with other characters. From e6d4ec22e89c2a85c410aaff72cfcd93627a1472 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Thu, 17 Jul 2025 12:37:06 -0700 Subject: [PATCH 5/9] Changes suggested by CR and expanded alternatives * Clarified choice between syntactic and identifier use * Added link to a similar C++ proposal * Expanded the alternatives section discussing how characters could be given syntactic meaning instead --- text/0000-compat-math-identifiers.md | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md index 5550c92338a..6398a89f1e4 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/0000-compat-math-identifiers.md @@ -84,11 +84,14 @@ The Unicode sets [`ID_Compat_Math_Start`](https://util.unicode.org/UnicodeJsps/l The characters 1) - 4) are added to the set of characters allowed in Rust identifiers. [UAX31](https://www.unicode.org/reports/tr31/#Standard_Profiles) notes that "supporting these characters is recommended for some computer languages because they can be beneficial in some applications". -These characters will not have syntactic use and are only added to the set of characters allowed in identifiers following the recommendations of [UAX31-R3b](https://www.unicode.org/reports/tr31/#R3b). + +In other words this RFC proposes to adopt the "Mathematical Compatibility Notation Profile" which in accordance with [UAX31-R3b](https://www.unicode.org/reports/tr31/#R3b) allows these characters in identifiers and in turn prevents syntactic use. For example `let a = 2.0; let b = a²;` will naturally give a compiler error that `a²` is an unknown identifier and not be interpreted as `let b = a * a;`. Similarly `let a = [2, 0]; let b = a₁;` will naturally give a compiler error that `a₁` is an unknown identifier and not be interpreted as `let b = a[0];`. +`∞` will just be a character usable in identifiers and not be a synonym to the likes of `f32::INFINITY`. -The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC warning when used: +The characters 5) are added to the set of Rust identifiers, but will trigger an NFKC or `uncommon_codepoints` warning when used depending on their Unicode classification. +For example using `𝛁` in an identifier will trigger: ``` warning: identifier contains a non normalized (NFKC) character: '𝛁' ``` @@ -98,6 +101,7 @@ In particular the characters in 5) are identified as "Not_NFKC", i.e. characters Note that Unicode specifically [mentions Rust as a positive industry example](https://www.unicode.org/reports/tr55/#General-Security-Profile) that follows the recommendations from the General Security Profile. + # Drawbacks [drawbacks]: #drawbacks @@ -111,8 +115,6 @@ Note that Unicode specifically [mentions Rust as a positive industry example](ht * Some people might find it difficult to read superscript and subscript letters on lower resolution screens or when using small font sizes. -* Rust might want to decide in the future to give certain superscripts and subscripts syntactic meaning. For example they might want to interpret `a²` as `a * a` or `a₁` as `a[0]`. The latter sounds espcially unlikely though due to the general disagreement of 0-based vs 1-based indexing. - # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -120,6 +122,19 @@ If this RFC is not implemented then everyone has to keep using ASCII characters The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts. +Alternatively Rust could decide to give the proposed characters syntatic meaning. + +Superscript characters could be interpreted as potentiation, for example `let a = 2; let b = a²;` could be a synonym to `let a = 2; let b = a * a;`. +This would open up a host of questions and potential issues, like: +- Should `a²⁻³` be interpreted as `1/a`? +- There is no superscript character for multiplication `*`. + +`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition. + +Derivatives could be added as a language features via auto-differentiation techniques thus giving `∇` and `∂` syntactic meaning, however there is no precedence of this in other languages and similar features are usually provided by libraries. + +Subscript characters could be given syntatic meaning, for example `a₁` could be a synonym to `a[1]`, however this would be highly contentious and error prone due to the general disagreement between 0-based vs 1-based indexing and would suffer from similar problems as using superscript for potentiation. + # Prior art [prior-art]: #prior-art @@ -171,6 +186,8 @@ fn main() { } ``` +[C++ P3658R0](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3658r0.pdf) is a similar proposal with similar reasoning for the C++ language. In particular it states that the characters suggested in this RFC where allowed in C++11 to C++20 as originally published. + # Unresolved questions [unresolved-questions]: #unresolved-questions From c1fe8b4ac9fa3d005cbd3fdd2e06c02e1a163d58 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Thu, 17 Jul 2025 14:01:36 -0700 Subject: [PATCH 6/9] Added link to a Rust autodiff experiment and improved some wording --- text/0000-compat-math-identifiers.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/text/0000-compat-math-identifiers.md b/text/0000-compat-math-identifiers.md index 6398a89f1e4..f1e4edb4ed8 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/0000-compat-math-identifiers.md @@ -124,16 +124,19 @@ The impact of not implementing it should be fairly small, but implementing it co Alternatively Rust could decide to give the proposed characters syntatic meaning. -Superscript characters could be interpreted as potentiation, for example `let a = 2; let b = a²;` could be a synonym to `let a = 2; let b = a * a;`. +Superscript characters could be interpreted as exponentiation, for example `let a = 2; let b = a²;` could be a synonym to `let a = 2; let b = a * a;`. This would open up a host of questions and potential issues, like: - Should `a²⁻³` be interpreted as `1/a`? -- There is no superscript character for multiplication `*`. +- There is no superscript character for multiplication `*` or division `/`. `∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition. -Derivatives could be added as a language features via auto-differentiation techniques thus giving `∇` and `∂` syntactic meaning, however there is no precedence of this in other languages and similar features are usually provided by libraries. +Derivatives could be added as a language feature using auto-differentiation techniques and `∇` and `∂` could be given syntactic meaning. +For example Mathematica supports the syntax `∂ₓf` for a partial derivative of `f` with respect to `x` and the syntax `∇ₓf` for the gradient with respect to `x`. +Moreover there is [experimental support for automatic differentiation](https://github.com/rust-lang/rust/issues/124509) being worked on for rustc which uses an attribute `#[autodiff(df, ..)]` with a user-provided function name (`df` in this case) for the automatically generated derivative. +A potential synergy with this RFC would be that the developer can choose `∇f` as the function name of the automatically created derivative via `#[autodiff(∇f, ..)]`, however Rust could also decide to automatically use `∇f` as the name of such derivatives and thus giving `∇f` syntactic meaning. -Subscript characters could be given syntatic meaning, for example `a₁` could be a synonym to `a[1]`, however this would be highly contentious and error prone due to the general disagreement between 0-based vs 1-based indexing and would suffer from similar problems as using superscript for potentiation. +Subscript characters could be given syntatic meaning, for example `a₁` could be a synonym to `a[1]`, however this would be highly contentious and error prone due to the general disagreement between 0-based versus 1-based indexing and would suffer from similar problems as using superscripts for exponentiation. # Prior art [prior-art]: #prior-art From b58cbb6eb7af2c687d44aa1315115b6a30d8be22 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Thu, 17 Jul 2025 16:22:11 -0700 Subject: [PATCH 7/9] Updated file name with issue number --- ...mpat-math-identifiers.md => 3840-compat-math-identifiers.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-compat-math-identifiers.md => 3840-compat-math-identifiers.md} (99%) diff --git a/text/0000-compat-math-identifiers.md b/text/3840-compat-math-identifiers.md similarity index 99% rename from text/0000-compat-math-identifiers.md rename to text/3840-compat-math-identifiers.md index f1e4edb4ed8..0be699a688a 100644 --- a/text/0000-compat-math-identifiers.md +++ b/text/3840-compat-math-identifiers.md @@ -1,6 +1,6 @@ - Feature Name: `compat_math_identifiers` - Start Date: 2025-07-16 -- RFC PR: [TODO rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3840](https://github.com/rust-lang/rfcs/pull/3840) - Rust Issue: [TODO rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary From 1113b8addea81515e7112e14409846dce5ddef5b Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Sat, 19 Jul 2025 19:03:02 -0700 Subject: [PATCH 8/9] Annotated code blocks with ```rust to enable syntax highlighting --- text/3840-compat-math-identifiers.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/text/3840-compat-math-identifiers.md b/text/3840-compat-math-identifiers.md index 0be699a688a..9f4f056e5d5 100644 --- a/text/3840-compat-math-identifiers.md +++ b/text/3840-compat-math-identifiers.md @@ -20,18 +20,18 @@ Variables are often adorned with subscripts like `x₁₂` or superscripts like Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusing and implementation errors. For example instead of: -``` +```rust let gradient_energy_1 = 2.0 * (position_1 - center_1); let gradient_energy_2 = 2.0 * (position_2 - center_2); ``` one could write: -``` +```rust let ∇E₁ = 2.0 * (p₁ - c₁); let ∇E₂ = 2.0 * (p₂ - c₂); ``` A longer example from the "wilds": -``` +```rust fn strain_energy_hessian_coeffs(l0: f64, l: f64) -> [f64; 4] { let l02 = l0.powi(2); let l03 = l0 * l02; @@ -47,7 +47,7 @@ fn strain_energy_hessian_coeffs(l0: f64, l: f64) -> [f64; 4] { } ``` With this RFC one could write: -``` +```rust fn strain_energy_hessian_coeffs(l₀: f64, l: f64) -> [f64; 4] { let l₀² = l₀.powi(2); let l₀³ = l₀ * l₀²; @@ -143,7 +143,7 @@ Subscript characters could be given syntatic meaning, for example `a₁` could b Rust has the philosophy to be open to various cultures and languages and allow them use their native symbols as identifiers. Example which compiles in stable Rust without warning: -``` +```rust fn main() { let λ = 2.718_f32; // Greek letter lambda let 파이 = 3.141_f32; // Korean word for "pie" @@ -154,7 +154,7 @@ fn main() { Many of these characters are easily confusable if not attuned to the corresponding language or culture. Example which compiles in stable Rust without warning: -``` +```rust fn main() { let 鳯 = 3; // U+9CF5: "phoenix" (old variant) let 鳳 = 4; // U+9CF4: modern simplified/traditional @@ -167,7 +167,7 @@ fn main() { A vast set of characters added as part of Unicode character sets are easily confusable with other characters. Example which compiles in stable Rust and triggers a "warning: identifier contains a non normalized (NFKC) character": -``` +```rust fn main() { let l = 1.0; let ℓ = l + 2.0; @@ -179,7 +179,7 @@ fn main() { There are characters which one might argue should never have been added but are part of allowed Unicode sets. Example which compiles in stable Rust and triggers a "warning: identifier contains an uncommon character": -``` +```rust fn ᅟ() { // U+115F (Hangul Choseong Filler) renders as blank println!("boo"); } From 5556c698938d3b46b6e8a1a883c6107385034a86 Mon Sep 17 00:00:00 2001 From: David Weikersdorfer <517608+Danvil@users.noreply.github.com> Date: Sat, 2 Aug 2025 15:03:13 -0700 Subject: [PATCH 9/9] Fixed grammar and reference to autodiff feature --- text/3840-compat-math-identifiers.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/text/3840-compat-math-identifiers.md b/text/3840-compat-math-identifiers.md index 9f4f056e5d5..260974af71a 100644 --- a/text/3840-compat-math-identifiers.md +++ b/text/3840-compat-math-identifiers.md @@ -17,7 +17,7 @@ Programming languages have historically focused on the quite narrow set of ASCII The vast body of scientific literature uses a variety of characters to express concepts from physics, mathematics, biology, robotics and many others. Symbols often appearing in equations are Roman letters like `x`, Greek letters like `θ`, and differentiation operators like `∂` and `∇`. Variables are often adorned with subscripts like `x₁₂` or superscripts like `x⁺` or `x⁽²⁾`. -Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusing and implementation errors. +Having these symbols available as Rust identifiers could simplify the implementation of these concepts and stay closer to a reference publication, thus reducing confusion and implementation errors. For example instead of: ```rust @@ -105,7 +105,7 @@ Note that Unicode specifically [mentions Rust as a positive industry example](ht # Drawbacks [drawbacks]: #drawbacks -* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedence in Rust seems to be to add them alongside their base version but trigger the NFKC warning. +* Characters like `𝛁𝛛𝛻𝜕𝜵𝝏𝝯𝞉𝞩𝟃` are easily confusable with their base versions `∂∇` and can lead to subtle bugs. However the precedent in Rust is to add them alongside their base version, but trigger the NFKC warning. * Some developers prefer to only use ASCII characters for programming. This paradigm can be enforced today via `deny(non_ascii_idents)`. This would disallow all characters added by this RFC. @@ -118,7 +118,7 @@ Note that Unicode specifically [mentions Rust as a positive industry example](ht # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -If this RFC is not implemented then everyone has to keep using ASCII characters for identifier in scientific code, for example `gradient_energy` or `a_12`. +If this RFC is not implemented then everyone has to keep using ASCII characters for identifiers in scientific code, for example `gradient_energy` or `a_12`. The impact of not implementing it should be fairly small, but implementing it could invite more scientific oriented people to the Rust language and make it easier for them to implement complex concepts. @@ -129,12 +129,12 @@ This would open up a host of questions and potential issues, like: - Should `a²⁻³` be interpreted as `1/a`? - There is no superscript character for multiplication `*` or division `/`. -`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedence for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition. +`∞` could be a synonym or replacement to `f32::INIFITY`, however there is no precedent for using non-ASCII characters in `core`/`std` and this would likely meet considerable opposition. Derivatives could be added as a language feature using auto-differentiation techniques and `∇` and `∂` could be given syntactic meaning. For example Mathematica supports the syntax `∂ₓf` for a partial derivative of `f` with respect to `x` and the syntax `∇ₓf` for the gradient with respect to `x`. -Moreover there is [experimental support for automatic differentiation](https://github.com/rust-lang/rust/issues/124509) being worked on for rustc which uses an attribute `#[autodiff(df, ..)]` with a user-provided function name (`df` in this case) for the automatically generated derivative. -A potential synergy with this RFC would be that the developer can choose `∇f` as the function name of the automatically created derivative via `#[autodiff(∇f, ..)]`, however Rust could also decide to automatically use `∇f` as the name of such derivatives and thus giving `∇f` syntactic meaning. +Moreover there is [an experimental feature](https://doc.rust-lang.org/nightly/std/autodiff/attr.autodiff_forward.html) for Rust which provides auto-differentiation via an attribute macro `#[autodiff_forward(name, ..)]` with a user-provided function name for the automatically generated derivative. +With this RFC `∇f` could be used as the function name, i.e. `#[autodiff_forward(∇foo, ..)]`. However Rust could also decide to automatically use `∇foo` as the derivatives of `foo`, and give `∇` syntactic meaning. Subscript characters could be given syntatic meaning, for example `a₁` could be a synonym to `a[1]`, however this would be highly contentious and error prone due to the general disagreement between 0-based versus 1-based indexing and would suffer from similar problems as using superscripts for exponentiation. @@ -188,6 +188,7 @@ fn main() { ᅟ(); // dito } ``` +Note that the character is the name of the function, even though it renders inside the parentheses in some browsers. [C++ P3658R0](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3658r0.pdf) is a similar proposal with similar reasoning for the C++ language. In particular it states that the characters suggested in this RFC where allowed in C++11 to C++20 as originally published.