Skip to content

fix: harden regex builtins with compiled-size limit#705

Merged
anakrish merged 1 commit into
microsoft:mainfrom
anakrish:feature/regex-size-limit
May 4, 2026
Merged

fix: harden regex builtins with compiled-size limit#705
anakrish merged 1 commit into
microsoft:mainfrom
anakrish:feature/regex-size-limit

Conversation

@anakrish
Copy link
Copy Markdown
Collaborator

@anakrish anakrish commented May 4, 2026

Summary

Adds a compiled-size limit to all regex builtins in Regorus, preventing adversarial patterns from consuming excessive CPU during DFA construction or matching.

Approach

  • Engine-level cap: Uses RegexBuilder::size_limit(100KB) to reject patterns whose compiled NFA exceeds the limit
  • Centralized helper: compile_regex_for_builtin() applies the limit consistently across all regex builtins
  • Error propagation: LimitError::RegexSizeLimitExceeded propagates as a fatal VM error that cannot be absorbed by rule evaluation (including under negation)
  • State cleanup: restore_rule_state() ensures register/loop/comprehension stacks are properly unwound on fatal errors

Why 100KB?

  • All real-world policy patterns observed in production compile well under 100KB
  • All adversarial patterns identified during benchmarking exceed this threshold
  • Making the limit configurable at runtime can be added as a follow-up if needed

Pre-existing bug fix

Resource-limit errors (time, memory, instruction count) raised inside builtins were silently swallowed to Undefined when strict_builtin_errors was off (the default). This meant not regex.match(...) with an oversized pattern would see Undefined and flip to true — silently wrong. Fixed by teaching the error-absorption paths in the interpreter, RVM builtin dispatch, and RVM rule-execution loop to recognize LimitError and propagate it.

Test coverage

Regression tests cover legitimate policy patterns (IPv4, hostname, semver, UUID, email, ARM resource IDs, etc.) alongside adversarial patterns that must be rejected. Edge cases include negation (not regex.match), and oversized patterns through every regex builtin: regex.replace, regex.find_n, regex.find_all_string_submatch_n, regex.split, regex.template_match, and regex.is_valid. The test harness runs each case through both the tree-walking interpreter and the RVM.

Related

@anakrish anakrish requested a review from Copilot May 4, 2026 15:23
@anakrish anakrish marked this pull request as draft May 4, 2026 15:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens Rego regex.* builtins by compiling patterns with a fixed size limit and by propagating resource-limit failures instead of silently converting them to Undefined. It fits into the engine’s broader execution-limit machinery by extending both the interpreter and RVM error paths.

Changes:

  • Add a 100 KiB compiled-regex size limit and route regex builtin compilation through shared limit-aware helpers.
  • Introduce RegexSizeLimitExceeded as a first-class limit error in shared limits and VM error types.
  • Update interpreter/RVM builtin and rule-evaluation paths so limit errors propagate even when strict_builtin_errors is off, plus add RVM YAML coverage for regex size-limit cases.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/rvm/rego/cases/regex_size_limit.yaml Adds RVM regression cases for valid, invalid, and oversized regex patterns.
src/utils/limits/error.rs Adds shared LimitError::RegexSizeLimitExceeded.
src/rvm/vm/rules.rs Treats limit errors as fatal during common rule execution.
src/rvm/vm/machine.rs Maps regex size-limit errors into VmError.
src/rvm/vm/functions.rs Stops swallowing builtin limit errors in non-strict RVM mode.
src/rvm/vm/execution.rs Prevents stackless rule evaluation from absorbing fatal VM errors.
src/rvm/vm/errors.rs Adds VM regex size-limit error and preserves limit-error identity on conversion.
src/interpreter.rs Stops swallowing builtin limit errors in non-strict interpreter mode.
src/builtins/regex.rs Enforces regex size limits across regex builtins and special-cases regex.is_valid.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rvm/vm/rules.rs
Comment thread src/rvm/vm/rules.rs
Comment thread src/rvm/vm/execution.rs
Comment thread src/rvm/vm/errors.rs
Comment thread tests/rvm/rego/cases/regex_size_limit.yaml
Comment thread src/builtins/regex.rs
@anakrish anakrish force-pushed the feature/regex-size-limit branch from a231ccc to 89125de Compare May 4, 2026 17:22
@anakrish anakrish requested a review from Copilot May 4, 2026 17:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rvm/vm/functions.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/interpreter.rs
Comment thread tests/rvm/rego/cases/regex_size_limit.yaml
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/builtins/regex.rs
Comment thread src/rvm/vm/functions.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/builtins/regex.rs
Comment thread src/interpreter.rs
Add a 100KB cap on compiled regex NFA size via RegexBuilder::size_limit()
to block patterns that blow up in memory or CPU. Regex compilation now
goes through a single helper (compile_regex_for_builtin) so the limit
is enforced consistently across all regex builtins.

While doing this, found and fixed a pre-existing bug: resource-limit
errors (time, memory, instruction count) raised inside builtins were
quietly swallowed to Undefined when strict_builtin_errors was off
(the default). This is a problem because `not regex.match(...)` would
see Undefined and flip to true -- silently wrong. The same issue now
applies to the new regex size limit.

Fixed by teaching the three error-absorption paths (interpreter builtin
call, RVM builtin dispatch, and RVM rule-execution loop) to recognize
LimitError and let it propagate instead of eating it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@anakrish anakrish force-pushed the feature/regex-size-limit branch from 2f5c598 to 0b3e5ee Compare May 4, 2026 18:44
@anakrish anakrish requested a review from Copilot May 4, 2026 18:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rvm/vm/functions.rs
Comment thread src/builtins/regex.rs
Comment thread src/rvm/vm/rules.rs
Comment thread src/utils/limits/error.rs
@anakrish anakrish marked this pull request as ready for review May 4, 2026 20:20
@anakrish anakrish merged commit 87f22a7 into microsoft:main May 4, 2026
63 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants