Closed as not planned
Description
What version of regex are you using?
Version 1.9.6
Describe the bug at a high level.
Simple regex like \w{256}
take tens of milliseconds to compile in release mode, and hundreds of milliseconds in debug mode.
Consider a code base with a single regex like this and a hundred tests in debug mode. If each test run in a different process, that's about 30 seconds of CPU time spent uniquely building this one regex for every run of the test suite.
What are the steps to reproduce the behavior?
#![feature(test)]
use regex::Regex;
extern crate test;
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
#[bench]
fn compile_unicode_regex(b: &mut Bencher) {
b.iter(|| Regex::new(r"\w{256}"));
}
}
What is the actual behavior?
About 40 milliseconds in release mode and 290 milliseconds in debug mode to build \w{256}
:
$ cargo +nightly bench
Finished bench [optimized] target(s) in 0.31s
Running unittests src/lib.rs (target/release/deps/regex_slow-319c2c8ca1c009f8)
running 1 test
test tests::compile_unicode_regex ... bench: 41,179,283 ns/iter (+/- 1,554,197)
± cargo +nightly bench --profile dev
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running unittests src/lib.rs (target/debug/deps/regex_slow-623fc42fab1b6baf)
running 1 test
test tests::compile_unicode_regex ... bench: 287,807,317 ns/iter (+/- 6,803,723)
What is the expected behavior?
Something similar to python where it takes about 50 microseconds to build and match \w{256}
:
import re
import time
start = time.perf_counter()
assert re.compile(r"\w{256}").match("é" * 256)
print(f"> {(time.perf_counter() - start) * 1_000_000:.3f} μs")
# Outputs:
# > 50.566 μs