Short circuit complex case evaluation modes as soon as possible #17898

pepijnve · 2025-10-03T12:28:06Z

Which issue does this PR close?

Improvement in the context of #18075

Rationale for this change

Speculative performance improvements for case evaluation

What changes are included in this PR?

Short circuit case evaluation loop when as soon as a value has been calculated for each input rows

Are these changes tested?

(Hopefully) covered by SQL logic tests

Are there any user-facing changes?

No

pepijnve · 2025-10-03T12:29:56Z

@alamb could you trigger a benchmark run on this PR (once CI gives the green light)?

alamb · 2025-10-03T13:41:37Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-03T13:48:30Z

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     23.7±0.23µs        ? ?/sec    1.02     24.2±0.24µs        ? ?/sec
case_when: column or null      1.00   1401.1±2.72ns        ? ?/sec    1.01   1409.5±6.15ns        ? ?/sec
case_when: expr or expr        1.00     30.7±0.21µs        ? ?/sec    1.03     31.6±0.22µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.01µs        ? ?/sec    1.02      8.1±0.02µs        ? ?/sec

pepijnve · 2025-10-03T14:08:29Z

Small improvement (which is what I had expected). WDYT, worth integrating?
Could just be noise too though. The 'expr or expr' and 'scalar or scalar' code paths were not modified.

Edit: I had a closer look at the case_when benchmark. It's probably too much of a microbenchmark to see any difference there. It does show that the extra conditionals don't tank performance.

The example I'm looking at locally is a projection where a classification category is being associated with each row using a large case expression. Something like
SELECT *, CASE WHEN predicate_1 THEN 0 WHEN predicate_2 THEN 1 WHEN predicate_2 THEN 2 ... END FROM table. If my reading of the code is correct, there's much more data being manipulated in a scenario like that.

alamb

Seems like it is worth pursing to me -- thanks @pepijnve

I wonder if we can factor out the pattern of "don't call evaluate_selection if the selection is still the entire remainder (mostly so we can document the behavior / make sure this it doesn't get lost in some future refactoring)

Maybe something we could wrap the count in some structure or a function

🤔

let mut remainder = Remainder::new(batch.num_rows());
remainder.update(&self.when_then_expr[i], &batch);

alamb · 2025-10-03T20:40:44Z

datafusion/physical-expr/src/expressions/case.rs

        let mut current_value = new_null_array(&return_type, batch.num_rows());
        // We only consider non-null values while comparing with whens
        let mut remainder = not(&base_nulls)?;
+        let mut remainder_count = remainder.true_count();


we have found in past evaluations, that the code generated for true_count is astonishingly fast (it uses some special hardware instruction) so I am not surprised this works well

Good to know that that’s fairly cheap.

What I’m experimenting with is retaining the filtered record batch from one loop iteration to the next so that the amount of data to be churned through each iteration shrinks.

I'm going to leave the more complex optimisation work for another PR. The short circuit logic is already useful on it's own.

alamb · 2025-10-03T20:49:11Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3b75814) to 10a437b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-03T20:55:49Z

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.00     24.0±0.10µs        ? ?/sec    1.00     23.9±0.14µs        ? ?/sec
case_when: column or null      1.00   1411.3±1.41ns        ? ?/sec    1.00   1411.1±5.67ns        ? ?/sec
case_when: expr or expr        1.01     31.2±0.25µs        ? ?/sec    1.00     31.0±0.28µs        ? ?/sec
case_when: scalar or scalar    1.00      8.0±0.02µs        ? ?/sec    1.00      8.0±0.02µs        ? ?/sec

pepijnve · 2025-10-15T16:06:31Z

The expr_is_restrict_null_predicate caught a bug I had introduced that was not caught by the SLTs. I've added extra SLTs for case to plug the coverage hole.

alamb · 2025-10-15T20:50:32Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (da76c30) to 057583d diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-15T20:58:03Z

🤖: Benchmark completed

Details

group                          case_improvements                      main
-----                          -----------------                      ----
case_when: CASE expr           1.01     23.3±0.10µs        ? ?/sec    1.00     23.1±0.18µs        ? ?/sec
case_when: column or null      1.00   1430.0±2.39ns        ? ?/sec    1.00  1434.4±12.61ns        ? ?/sec
case_when: expr or expr        1.00     31.1±0.11µs        ? ?/sec    1.00     31.1±0.12µs        ? ?/sec
case_when: scalar or scalar    1.01      8.3±0.07µs        ? ?/sec    1.00      8.2±0.08µs        ? ?/sec

pepijnve · 2025-10-15T22:00:02Z

I’ll have a look at the existing micro benchmarks tomorrow. Not sure if there’s anything in there already with sufficient branches that you would notice the impact.

pepijnve · 2025-10-16T11:38:17Z

I had already done the work of adding extra benchmarks in another branch. I've moved that over to #18097 which is now just an extension of the micro benchmarks. That'll make it easier to compare results.

alamb · 2025-10-17T19:20:05Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (558ad59) to ec3d20b diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-17T20:04:08Z

🤖: Benchmark completed

Details

group                                                                                                             case_improvements                      main
-----                                                                                                             -----------------                      ----
case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END    1.01       2.6±0.08s        ? ?/sec    1.00       2.6±0.05s        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                         1.00     55.4±0.16µs        ? ?/sec    1.00     55.2±0.08µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                       1.01   478.5±13.92µs        ? ?/sec    1.00    472.1±6.25µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                   1.00      6.7±0.01µs        ? ?/sec    1.00      6.7±0.01µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END           1.00       3.1±0.02s        ? ?/sec    1.00       3.1±0.01s        ? ?/sec
case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                     1.00    458.6±7.40µs        ? ?/sec    1.05   481.7±15.12µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END      1.01    489.8±3.82ms        ? ?/sec    1.00    484.6±4.84ms        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                           1.00     55.3±0.24µs        ? ?/sec    1.00     55.3±0.28µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                         1.03    104.6±1.02µs        ? ?/sec    1.00    102.0±0.39µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                     1.00      6.6±0.01µs        ? ?/sec    1.00      6.6±0.01µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END             1.00    558.9±1.86ms        ? ?/sec    1.00    557.5±2.35ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                       1.01    115.2±0.45µs        ? ?/sec    1.00    114.2±0.48µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END     1.00  1434.2±33.04ms        ? ?/sec    1.00  1433.9±28.82ms        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                          1.00     55.3±0.12µs        ? ?/sec    1.00     55.3±0.10µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                        1.02    263.4±3.35µs        ? ?/sec    1.00    259.1±4.86µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                    1.01      6.7±0.02µs        ? ?/sec    1.00      6.6±0.01µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END            1.00   1782.6±7.70ms        ? ?/sec    1.00   1787.7±8.00ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                      1.03    275.1±5.26µs        ? ?/sec    1.00    267.7±2.42µs        ? ?/sec

pepijnve · 2025-10-17T20:14:55Z

Going to take a closer look at this. In one of the benches I would expect a more significant difference.

pepijnve · 2025-10-17T21:12:50Z

🤦‍♂️ I botched the benchmark. Operator::Eq in the predicates instead of Operator::LtEq...

pepijnve · 2025-10-17T21:20:59Z

So sorry about that @alamb. #18144 makes the benchmark actually do what it's supposed to do.

alamb · 2025-10-18T12:40:25Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1017-gcp #18~24.04.1-Ubuntu SMP Tue Sep 23 17:51:44 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing case_improvements (3f0eaa3) to 93f136c diff
BENCH_NAME=case_when
BENCH_COMMAND=cargo bench --bench case_when
BENCH_FILTER=
BENCH_BRANCH_NAME=case_improvements
Results will be posted here when complete

alamb · 2025-10-18T13:23:36Z

🤖: Benchmark completed

Details

group                                                                                                             case_improvements                      main
-----                                                                                                             -----------------                      ----
case_when 8192x100: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END    1.00      3.6±0.01ms        ? ?/sec    5.77     20.5±0.34ms        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                         1.00     55.7±0.09µs        ? ?/sec    1.02     56.6±0.42µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                       1.02    376.7±9.66µs        ? ?/sec    1.00    370.3±2.07µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                   1.00      6.7±0.02µs        ? ?/sec    1.01      6.8±0.02µs        ? ?/sec
case_when 8192x100: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END           1.00       2.8±0.01s        ? ?/sec    1.00       2.8±0.02s        ? ?/sec
case_when 8192x100: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                          1.00       2.8±0.01s        ? ?/sec    1.00       2.8±0.01s        ? ?/sec
case_when 8192x100: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                     1.00    370.7±8.05µs        ? ?/sec    1.01    374.7±8.37µs        ? ?/sec
case_when 8192x100: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                1.00      3.5±0.02ms        ? ?/sec    20.04    71.1±0.40ms        ? ?/sec
case_when 8192x3: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END      1.00    419.7±3.13µs        ? ?/sec    36.65    15.4±0.12ms        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                           1.00     55.2±0.15µs        ? ?/sec    1.02     56.3±0.11µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                         1.02     23.0±0.36µs        ? ?/sec    1.00     22.4±0.33µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                     1.01      6.8±0.02µs        ? ?/sec    1.00      6.7±0.01µs        ? ?/sec
case_when 8192x3: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END             1.02    259.9±0.68ms        ? ?/sec    1.00    255.9±0.83ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                            1.01    238.0±0.55ms        ? ?/sec    1.00    234.8±0.82ms        ? ?/sec
case_when 8192x3: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                       1.05     32.2±0.50µs        ? ?/sec    1.00     30.5±0.28µs        ? ?/sec
case_when 8192x3: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                  1.00    385.9±3.18µs        ? ?/sec    169.08    65.2±0.26ms        ? ?/sec
case_when 8192x50: CASE WHEN c1 < 0 THEN 0 WHEN c1 < 1000 THEN 1 ... WHEN c1 < n * 1000 THEN n ELSE n + 1 END     1.00  1885.7±17.69µs        ? ?/sec    9.95     18.8±0.31ms        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN 1 ELSE 0 END                                                          1.00     55.6±0.14µs        ? ?/sec    1.01     56.4±0.10µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 ELSE c3 END                                                        1.04    171.8±5.61µs        ? ?/sec    1.00    165.8±2.92µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 <= 500 THEN c2 [ELSE NULL] END                                                    1.00      6.8±0.01µs        ? ?/sec    1.00      6.8±0.02µs        ? ?/sec
case_when 8192x50: CASE WHEN c1 == 0 THEN 0 WHEN c1 == 1 THEN 1 ... WHEN c1 == n THEN n ELSE n + 1 END            1.00   1459.3±7.66ms        ? ?/sec    1.00   1453.8±6.30ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 0 THEN 0 WHEN 1 THEN 1 ... WHEN n THEN n ELSE n + 1 END                           1.00   1432.8±7.46ms        ? ?/sec    1.00   1428.1±6.66ms        ? ?/sec
case_when 8192x50: CASE c1 WHEN 1 THEN c2 WHEN 2 THEN c3 END                                                      1.07    185.5±4.28µs        ? ?/sec    1.00    173.4±1.21µs        ? ?/sec
case_when 8192x50: CASE c2 WHEN 0 THEN 0 WHEN 1000 THEN 1 ... WHEN n * 1000 THEN n ELSE n + 1 END                 1.00  1861.4±10.71µs        ? ?/sec    37.20    69.2±0.30ms        ? ?/sec

alamb · 2025-10-18T13:26:47Z

pepijnve · 2025-10-18T13:28:37Z

Those are the gains I was expecting to see! 🚀 Just to keep our feet solidly on the ground, these are somewhat contrived and extreme examples. But I'll take it.

alamb

Thank you for the diligence @pepijnve

alamb · 2025-10-18T13:30:27Z

datafusion/physical-expr/src/expressions/case.rs

        // start with nulls as default output
        let mut current_value = new_null_array(&return_type, batch.num_rows());
        let mut remainder = BooleanArray::from(vec![true; batch.num_rows()]);
+        let mut remainder_count = batch.num_rows();


given the similarity of this code and the one above, I wonder if there is some way to avoid the duplication (as part of a follow on PR)

In #18152 the code is changing a bit further. If the approach there pans out I want to try to do the same for case_when_with_expr. It'll be easier to see if there's an extractable pattern once that work settles down, so if it's ok with you I would like to postpone your suggestion for a little bit.

yes, of course -- we can always make the code better as follow on PRs

github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 3, 2025

pepijnve force-pushed the case_improvements branch from bb647a5 to 3b75814 Compare October 3, 2025 12:28

alamb reviewed Oct 3, 2025

View reviewed changes

alamb mentioned this pull request Oct 15, 2025

[EPIC] A collection of items to improve CASE performance #18075

Open

8 tasks

pepijnve force-pushed the case_improvements branch from 3b75814 to a1add0a Compare October 15, 2025 14:46

pepijnve changed the title ~~Case evaluation improvements~~ Short circuit complex case evaluation modes as soon as possible Oct 15, 2025

pepijnve marked this pull request as ready for review October 15, 2025 14:46

Short circuit case evaluation as soon as all rows have been evaluated

563dbf0

pepijnve force-pushed the case_improvements branch from a1add0a to 563dbf0 Compare October 15, 2025 15:51

Add extra SLTs

da76c30

github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Oct 15, 2025

Merge branch 'main' into case_improvements

558ad59

Merge branch 'main' into case_improvements

3f0eaa3

pepijnve mentioned this pull request Oct 18, 2025

Reduce unnecessary record batch filtering in "case with(out) expression" #18152

Draft

alamb approved these changes Oct 18, 2025

View reviewed changes

alamb added the performance Make DataFusion faster label Oct 18, 2025

Short circuit complex case evaluation modes as soon as possible #17898

Are you sure you want to change the base?

Short circuit complex case evaluation modes as soon as possible #17898

Conversation

pepijnve commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pepijnve commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

pepijnve commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

alamb commented Oct 3, 2025

Uh oh!

pepijnve commented Oct 15, 2025

Uh oh!

alamb commented Oct 15, 2025

Uh oh!

alamb commented Oct 15, 2025

Uh oh!

pepijnve commented Oct 15, 2025

Uh oh!

pepijnve commented Oct 16, 2025

Uh oh!

alamb commented Oct 17, 2025

Uh oh!

alamb commented Oct 17, 2025

Uh oh!

pepijnve commented Oct 17, 2025

Uh oh!

pepijnve commented Oct 17, 2025

Uh oh!

pepijnve commented Oct 17, 2025

Uh oh!

alamb commented Oct 18, 2025

Uh oh!

alamb commented Oct 18, 2025

Uh oh!

alamb commented Oct 18, 2025

Uh oh!

pepijnve commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

pepijnve Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

pepijnve commented Oct 3, 2025 •

edited

Loading

pepijnve commented Oct 3, 2025 •

edited

Loading

pepijnve commented Oct 3, 2025 •

edited

Loading

pepijnve commented Oct 18, 2025 •

edited

Loading