Skip to content

Simple regex like \w{256} take tens of milliseconds to compile #1095

Closed as not planned
@vxgmichel

Description

@vxgmichel

What version of regex are you using?

Version 1.9.6

Describe the bug at a high level.

Simple regex like \w{256} take tens of milliseconds to compile in release mode, and hundreds of milliseconds in debug mode.

Consider a code base with a single regex like this and a hundred tests in debug mode. If each test run in a different process, that's about 30 seconds of CPU time spent uniquely building this one regex for every run of the test suite.

What are the steps to reproduce the behavior?

#![feature(test)]

use regex::Regex;

extern crate test;

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn compile_unicode_regex(b: &mut Bencher) {
        b.iter(|| Regex::new(r"\w{256}"));
    }
}

What is the actual behavior?

About 40 milliseconds in release mode and 290 milliseconds in debug mode to build \w{256}:

$ cargo +nightly bench  
    Finished bench [optimized] target(s) in 0.31s
     Running unittests src/lib.rs (target/release/deps/regex_slow-319c2c8ca1c009f8)

running 1 test
test tests::compile_unicode_regex ... bench:  41,179,283 ns/iter (+/- 1,554,197)

± cargo +nightly bench --profile dev
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running unittests src/lib.rs (target/debug/deps/regex_slow-623fc42fab1b6baf)

running 1 test
test tests::compile_unicode_regex ... bench: 287,807,317 ns/iter (+/- 6,803,723)

What is the expected behavior?

Something similar to python where it takes about 50 microseconds to build and match \w{256}:

import re
import time
start = time.perf_counter()
assert re.compile(r"\w{256}").match("é" * 256)
print(f"> {(time.perf_counter() - start) * 1_000_000:.3f} μs")
# Outputs:
# > 50.566 μs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions