attempt to ensure a better query execution plan for test run stats by forbidding hash joins (and some others) #809
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL:DR
The query optimizer was in some cases (at least: in the case where there are few phantom decisions) making a really bad decision on the execution plan for getting rule execution statistics on a test run, namely by planning a full table scan of decision rules.
In detail, I forbid hash joins, merge joins, and the reordering of joins for the query.
Details
Disclaimer
Sorry for the big empty space below the text, it seems this is an unexpected side effect of using the details tag in github markdown
Now to the real details
Considering the query for getting rule execution stats on
(phantom_)decisions
, as belowClick to toggle the query plan in the case of starting with the decisions table
Click to toggle the query plan in the case of starting with the phantom_decisions table
=> notice the hash join with full table scan on decision_rules, which is never going to complete
Click to toggle the query plan in the case of starting with the phantom_decisions table, after the PR
=> the hash join is gone, the query can execute kind of quickly (it may still be slowish if there are a lot of decisions, but now we have a chance, and we can reuse this query to precompute the stats as we go)