-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix set explain regex #20319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix set explain regex #20319
Conversation
The performance benchmark was around the call to trim the Additionally, after determining a snippet of affected SQL using the fuzzer, I added it to our unit tests. Without the fix, it hangs indefinitely. With the fix the whole suite runs imperceptibly fast. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
9a03168
to
45c2d64
Compare
Took a bit to pin down why this was happening, but with a fuzzer and some articles about exponential regex performance, I could address the underlying issue without changing the code much.
cd445b7
to
568f7f7
Compare
Review from nenadnoveljic is dismissed. Related teams and files:
- database-monitoring-agent
- postgres/changelog.d/20319.fixed
- postgres/datadog_checks/postgres/explain_parameterized_queries.py
- postgres/datadog_checks/postgres/statement_samples.py
- postgres/datadog_checks/postgres/util.py
- postgres/tests/test_statements.py
- postgres/tests/test_unit.py
568f7f7
to
944052a
Compare
No need to recalculate this, and passing it down will allow trimmed SQL to reuse the original query's signature.
944052a
to
43636d8
Compare
@nenadnoveljic I thought this auto-merged but something happened with a test. Between being sick and the US holiday, your review became stale… can you re-post approval when you have the chance? |
What does this PR do?
Reapplies the feature added in #20106, addressing the exponential runtime that certain SQL payloads could cause in the original implementation.
Notably, it removes repetition operators from inner groups in the regex and refactors it so such repetition appears in the outer groupings of inner alternations.
Testing
To test this, I wired up
sqlsmith
to a Python script which could generate the wholeSET
syntax. Between zero and tenSET
statements were prepended to everysqlsmith
-produced statement, as well as inline comments, at random. Over 100,000 statements were processed in this manner, and across many such runs, no exponential behavior was observed with the new regex (the old regex hung regularly, and quickly).Performance
One such run of over 200,000 statements is illustrated with the below histogram. The buckets are runtime, in nanoseconds.
Motivation
I'd like to ship this feature and had to pull it before because of it pegging the CPU to 100%.
Review checklist (to be filled by reviewers)
qa/skip-qa
label if the PR doesn't need to be tested during QA.backport/<branch-name>
label to the PR and it will automatically open a backport PR once this one is merged