Skip to content

Fix set explain regex #20319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 28, 2025
Merged

Fix set explain regex #20319

merged 5 commits into from
May 28, 2025

Conversation

jasonmp85
Copy link
Contributor

What does this PR do?

Reapplies the feature added in #20106, addressing the exponential runtime that certain SQL payloads could cause in the original implementation.

Notably, it removes repetition operators from inner groups in the regex and refactors it so such repetition appears in the outer groupings of inner alternations.

Testing

To test this, I wired up sqlsmith to a Python script which could generate the whole SET syntax. Between zero and ten SET statements were prepended to every sqlsmith-produced statement, as well as inline comments, at random. Over 100,000 statements were processed in this manner, and across many such runs, no exponential behavior was observed with the new regex (the old regex hung regularly, and quickly).

Performance

One such run of over 200,000 statements is illustrated with the below histogram. The buckets are runtime, in nanoseconds.

                      ┌                                        ┐ 
   [    0.0,  5000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 173761   
   [ 5000.0, 10000.0) ┤▇▇▇▇▇▇▇▇ 42857                            
   [10000.0, 15000.0) ┤▇ 6345                                    
   [15000.0, 20000.0) ┤ 1244                                     
   [20000.0, 25000.0) ┤ 341                                      
   [25000.0, 30000.0) ┤ 162                                      
   [30000.0, 35000.0) ┤ 52                                       
   [35000.0, 40000.0) ┤ 22                                       
   [40000.0, 45000.0) ┤ 12                                       
   [45000.0, 50000.0) ┤ 4                                        
   [50000.0, 55000.0) ┤ 4                                        
   [55000.0, 60000.0) ┤ 3                                        
   [60000.0, 65000.0) ┤ 2                                        
   [65000.0, 70000.0) ┤ 0                                        
   [70000.0, 75000.0) ┤ 0                                        
   [75000.0, 80000.0) ┤ 0                                        
   [80000.0, 85000.0) ┤ 0                                        
   [85000.0, 90000.0) ┤ 1                                        
                      └                                        ┘ 
                                      Frequency

Motivation

I'd like to ship this feature and had to pull it before because of it pegging the CPU to 100%.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@jasonmp85
Copy link
Contributor Author

The performance benchmark was around the call to trim the SET statements from the SQL using the new regex. The vast majority of calls run in 5–10 microseconds on my laptop.

Additionally, after determining a snippet of affected SQL using the fuzzer, I added it to our unit tests. Without the fix, it hangs indefinitely. With the fix the whole suite runs imperceptibly fast.

Copy link

codecov bot commented May 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.33%. Comparing base (0245920) to head (43636d8).
Report is 1 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
postgres 93.22% <100.00%> (+3.62%) ⬆️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jasonmp85 jasonmp85 force-pushed the fix-set-explain-regex branch 2 times, most recently from 9a03168 to 45c2d64 Compare May 21, 2025 18:32
nenadnoveljic
nenadnoveljic previously approved these changes May 21, 2025
@jasonmp85 jasonmp85 enabled auto-merge May 22, 2025 14:53
jasonmp85 added 4 commits May 27, 2025 13:27
Took a bit to pin down why this was happening, but with a fuzzer and
some articles about exponential regex performance, I could address the
underlying issue without changing the code much.
@jasonmp85 jasonmp85 force-pushed the fix-set-explain-regex branch from cd445b7 to 568f7f7 Compare May 27, 2025 19:27
@temporal-github-worker-1 temporal-github-worker-1 bot dismissed nenadnoveljic’s stale review May 27, 2025 19:27

Review from nenadnoveljic is dismissed. Related teams and files:

  • database-monitoring-agent
    • postgres/changelog.d/20319.fixed
    • postgres/datadog_checks/postgres/explain_parameterized_queries.py
    • postgres/datadog_checks/postgres/statement_samples.py
    • postgres/datadog_checks/postgres/util.py
    • postgres/tests/test_statements.py
    • postgres/tests/test_unit.py
@jasonmp85 jasonmp85 force-pushed the fix-set-explain-regex branch from 568f7f7 to 944052a Compare May 27, 2025 20:43
No need to recalculate this, and passing it down will allow trimmed SQL
to reuse the original query's signature.
@jasonmp85 jasonmp85 force-pushed the fix-set-explain-regex branch from 944052a to 43636d8 Compare May 27, 2025 21:02
@jasonmp85
Copy link
Contributor Author

@nenadnoveljic I thought this auto-merged but something happened with a test. Between being sick and the US holiday, your review became stale… can you re-post approval when you have the chance?

@jasonmp85 jasonmp85 added this pull request to the merge queue May 28, 2025
Merged via the queue into master with commit 196f124 May 28, 2025
22 checks passed
@jasonmp85 jasonmp85 deleted the fix-set-explain-regex branch May 28, 2025 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants