Skip to content

Speed up codespell:ignore check by skipping the regex in most cases #3721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nthykier
Copy link
Contributor

The codespell codebase unsurprisingly spends a vast majority of its runtime in various regex related code such as search and finditer.

The best way to optimize runtime spend in regexes is to not do a regex in the first place, since the regex engine has a rather steep overhead over regular string primitives (that is at the cost of flexibility). If the regex rarely matches and there is a very easy static substring that can be used to rule out the match, then you can speed up the code by using substring in string as a conditional to skip the regex. This is assuming the regex is used enough for the performance to matter.

An obvious choice here falls on the codespell:ignore regex, because it has a very distinctive substring in the form of codespell:ignore, which will rule out almost all lines that will not match.

With this little trick, runtime goes from ~5.4s to ~4.5s on the corpus mentioned in #3419.

@nthykier nthykier force-pushed the optimize-performance branch from 60ec7c5 to 118c028 Compare June 19, 2025 19:00
@nthykier nthykier marked this pull request as ready for review June 19, 2025 19:04
The codespell codebase unsurprisingly spends a vast majority of its
runtime in various regex related code such as `search` and `finditer`.

The best way to optimize runtime spend in regexes is to not do a regex
in the first place, since the regex engine has a rather steep overhead
over regular string primitives (that is at the cost of
flexibility). If the regex rarely matches and there is a very easy
static substring that can be used to rule out the match, then you can
speed up the code by using `substring in string` as a conditional to
skip the regex. This is assuming the regex is used enough for the
performance to matter.

An obvious choice here falls on the `codespell:ignore` regex, because
it has a very distinctive substring in the form of `codespell:ignore`,
which will rule out almost all lines that will not match.

With this little trick, runtime goes from ~5.4s to ~4.5s on the corpus
mentioned in codespell-project#3419.
@nthykier nthykier force-pushed the optimize-performance branch from 118c028 to 7dbc5a8 Compare June 22, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant