Skip to content

[SPARK-37467][SQL] Consolidate subexpression elimination code for whole stage and non-whole stage #51269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Kimahriman
Copy link
Contributor

What changes were proposed in this pull request?

This PR consolidates the code paths for subexpression elimination in whole stage and non-whole stage codegen. Whole stage codegen seemed to be mostly a superset of the non-whole stage subexpression elimination, just with whole stage not using the codegen context to track subexpressions.

This cleans things up by making non-whole stage use the same stateless approach as whole-stage so there is a single code path for all subexpression elimination, simplifying future improvements.

It shouldn't result in any functionality changes, but there are slight differences in the generated code as a result of this:

  • Subexpressions in whole stage always use mutable state for results instead of inlining variables to support code splitting in non-whole stage
  • Non-whole stage now supports the same inlining subexpressions if small enough as whole stage codegen

I'm not a JVM expert but I don't expect a mutable state to cause a performance regression over an inlined variable, especially since they are used in the split case anyway, and this may make future code splitting improvements simpler.

Why are the changes needed?

Currently, there are different code paths to handle subexpression elimination in whole stage and non-whole stage codegen. This makes it harder to add new capabilities to subexpression elimination having to deal with independent code paths.

Does this PR introduce any user-facing change?

No, just slight changes in generated code.

How was this patch tested?

Existing unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jun 24, 2025
@Kimahriman
Copy link
Contributor Author

This is a redo of #34727, except it goes the route of making non-whole stage stateless, instead of making whole-stage stateful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant