[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

Kimahriman · 2025-06-24T14:59:44Z

What changes were proposed in this pull request?

This PR consolidates the code paths for subexpression elimination in whole stage and non-whole stage codegen. Whole stage codegen seemed to be mostly a superset of the non-whole stage subexpression elimination, just with whole stage not using the codegen context to track subexpressions.

This cleans things up by making non-whole stage use the same stateless approach as whole-stage so there is a single code path for all subexpression elimination, simplifying future improvements.

It shouldn't result in any functionality changes, but there are slight differences in the generated code as a result of this:

Subexpressions in whole stage always use mutable state for results instead of inlining variables to support code splitting in non-whole stage
Non-whole stage now supports the same inlining subexpressions if small enough as whole stage codegen

I'm not a JVM expert but I don't expect a mutable state to cause a performance regression over an inlined variable, especially since they are used in the split case anyway, and this may make future code splitting improvements simpler.

Why are the changes needed?

Currently, there are different code paths to handle subexpression elimination in whole stage and non-whole stage codegen. This makes it harder to add new capabilities to subexpression elimination having to deal with independent code paths.

Does this PR introduce any user-facing change?

No, just slight changes in generated code.

How was this patch tested?

Existing unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

…tage

Kimahriman · 2025-06-24T15:12:54Z

This is a redo of #34727, except it goes the route of making non-whole stage stateless, instead of making whole-stage stateful

Consolidate subexpression elimination for whole stage and non-whole s…

85b581c

…tage

github-actions bot added the SQL label Jun 24, 2025

Kimahriman mentioned this pull request Jun 24, 2025

[SPARK-37466][SQL] Support subexpression elimination in higher order functions #51272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

Kimahriman commented Jun 24, 2025

Uh oh!

Kimahriman commented Jun 24, 2025

Uh oh!

Uh oh!

[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

Are you sure you want to change the base?

[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

Conversation

Kimahriman commented Jun 24, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Kimahriman commented Jun 24, 2025

Uh oh!

Uh oh!