fix: harden regex builtins with compiled-size limit#705
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens Rego regex.* builtins by compiling patterns with a fixed size limit and by propagating resource-limit failures instead of silently converting them to Undefined. It fits into the engine’s broader execution-limit machinery by extending both the interpreter and RVM error paths.
Changes:
- Add a 100 KiB compiled-regex size limit and route regex builtin compilation through shared limit-aware helpers.
- Introduce
RegexSizeLimitExceededas a first-class limit error in shared limits and VM error types. - Update interpreter/RVM builtin and rule-evaluation paths so limit errors propagate even when
strict_builtin_errorsis off, plus add RVM YAML coverage for regex size-limit cases.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
tests/rvm/rego/cases/regex_size_limit.yaml |
Adds RVM regression cases for valid, invalid, and oversized regex patterns. |
src/utils/limits/error.rs |
Adds shared LimitError::RegexSizeLimitExceeded. |
src/rvm/vm/rules.rs |
Treats limit errors as fatal during common rule execution. |
src/rvm/vm/machine.rs |
Maps regex size-limit errors into VmError. |
src/rvm/vm/functions.rs |
Stops swallowing builtin limit errors in non-strict RVM mode. |
src/rvm/vm/execution.rs |
Prevents stackless rule evaluation from absorbing fatal VM errors. |
src/rvm/vm/errors.rs |
Adds VM regex size-limit error and preserves limit-error identity on conversion. |
src/interpreter.rs |
Stops swallowing builtin limit errors in non-strict interpreter mode. |
src/builtins/regex.rs |
Enforces regex size limits across regex builtins and special-cases regex.is_valid. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a231ccc to
89125de
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
89125de to
2f5c598
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add a 100KB cap on compiled regex NFA size via RegexBuilder::size_limit() to block patterns that blow up in memory or CPU. Regex compilation now goes through a single helper (compile_regex_for_builtin) so the limit is enforced consistently across all regex builtins. While doing this, found and fixed a pre-existing bug: resource-limit errors (time, memory, instruction count) raised inside builtins were quietly swallowed to Undefined when strict_builtin_errors was off (the default). This is a problem because `not regex.match(...)` would see Undefined and flip to true -- silently wrong. The same issue now applies to the new regex size limit. Fixed by teaching the three error-absorption paths (interpreter builtin call, RVM builtin dispatch, and RVM rule-execution loop) to recognize LimitError and let it propagate instead of eating it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2f5c598 to
0b3e5ee
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Adds a compiled-size limit to all regex builtins in Regorus, preventing adversarial patterns from consuming excessive CPU during DFA construction or matching.
Approach
RegexBuilder::size_limit(100KB)to reject patterns whose compiled NFA exceeds the limitcompile_regex_for_builtin()applies the limit consistently across all regex builtinsLimitError::RegexSizeLimitExceededpropagates as a fatal VM error that cannot be absorbed by rule evaluation (including under negation)restore_rule_state()ensures register/loop/comprehension stacks are properly unwound on fatal errorsWhy 100KB?
Pre-existing bug fix
Resource-limit errors (time, memory, instruction count) raised inside builtins were silently swallowed to
Undefinedwhenstrict_builtin_errorswas off (the default). This meantnot regex.match(...)with an oversized pattern would seeUndefinedand flip totrue— silently wrong. Fixed by teaching the error-absorption paths in the interpreter, RVM builtin dispatch, and RVM rule-execution loop to recognizeLimitErrorand propagate it.Test coverage
Regression tests cover legitimate policy patterns (IPv4, hostname, semver, UUID, email, ARM resource IDs, etc.) alongside adversarial patterns that must be rejected. Edge cases include negation (
not regex.match), and oversized patterns through every regex builtin:regex.replace,regex.find_n,regex.find_all_string_submatch_n,regex.split,regex.template_match, andregex.is_valid. The test harness runs each case through both the tree-walking interpreter and the RVM.Related