Skip to content

[Python] Refactor identifiers#4443

Merged
deathaxe merged 9 commits intosublimehq:masterfrom
deathaxe:pr/python/refactor-identifiers
Mar 2, 2026
Merged

[Python] Refactor identifiers#4443
deathaxe merged 9 commits intosublimehq:masterfrom
deathaxe:pr/python/refactor-identifiers

Conversation

@deathaxe
Copy link
Collaborator

@deathaxe deathaxe commented Feb 25, 2026

Fixes #3993

This PR optimizes how Python parses identifiers, without yet relying on branching, but with the goal to make a step towards more accuracy.

It doesn't provide obvious changes with regards to scoping, besides

  1. some improvements with meta.path (where possible without adding branching).
  2. not scoping special top-level scopes after dots
  3. scoping special functions special, only in function-calls, as some of them are no longer reserved in python 3 and thus likely to be used as normal variables. Furthermore scope naming guidelines advice to scope function names as such, only in function call expressions.

It however improves overall parsing performance by about 15-20%, compared to current master. Benchmark was made using current master's syntax test file and some real world code files.

Note:

The primary challange is python actually requiring 2 sets of syntax rules. One for global expressions, which are always terminated by newline (or semicolon) and another one for expressions within brackets, which may span multiple lines without requiring a line continuation. This distinction is not yet fully made for identifiers, but a first step towards this goal is taken. Any further steps may require branching, which is a whole other story with regards to complexity - thus probably something for separate steps.

Use plural to denote non-popping behavior.
Use plural to denote non-popping behavior.
This commit...

1. wraps all identifiers into dedicated contexts, which immediately pop as soon
   as a special variable is consumed. It helps reducing syntax cache size and
   avoids duplicate popping and non-popping contexts.
2. removes redundant includes, which are already handled by `qualified-names`.

Note: This commit reduces parsing time of syntax test files by 13%.
This commit scopes built-in functions only in function call expressions.

Reason: Especially legacy python 2 built-in functions are likely overridden
and used as normal variables in modern python code. Highlighting them built-in
despite those no longer existing is somewhat off. If a built-in name is used
as local variable, it is likely by intent and thus should be scoped as such.
This commit...

1. replaces `items` context and its lookahead by appending `after-expression`
   context to each identifier.
2. introduces dedicated `type-hint-names` to replace `qualified-names` usage
   as those require dedicated item access syntax aka. `type-hint-lists`.
This commit refactors all identifier related contexts to replace redundant
lookaheads to distinguish qualified and unqualified identifiers.

With this commit, identifiers itself are parsed 20-25% faster.
Overall gain bench-marked against main syntax test file is about 10-15%.
This commit attempts to implement `as foo` expressions with a more common
pattern in all statements/contexts.
Python does no longer scope unqualified variables `meta.path`.
@deathaxe deathaxe merged commit f329e8a into sublimehq:master Mar 2, 2026
2 checks passed
@deathaxe deathaxe deleted the pr/python/refactor-identifiers branch March 2, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Special top-level identifiers should not be highlighted in dotted/qualified names

3 participants