Skip to content

Conversation

avantgardnerio
Copy link
Contributor

@avantgardnerio avantgardnerio commented Oct 12, 2025

Which issue does this PR close?

Rationale for this change

It's pretty important optimizer rules don't change results.

What changes are included in this PR?

  1. A new LimitEffect similar to CardinalityEffect with which plan nodes and window functions can declare their effects on limit_pushdown
  2. Most functions are None or Unknown, but the options is given to Lead() to calculate and return its effect
  3. Updates to the optimizer rule to take these into account, as well as partitioning
  4. Tests for all of the above

Are these changes tested?

In a new window_limits.slt to break things up.

Are there any user-facing changes?

Yes, we should patch-fix and notify users. Credit to @Dandandan for finding this bug.

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) proto Related to proto crate functions Changes to functions implementation ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Oct 12, 2025
@avantgardnerio avantgardnerio marked this pull request as ready for review October 13, 2025 21:07
Copy link
Contributor

@akurmustafa akurmustafa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the code changes and test cases, they all look good to me. Thanks @avantgardnerio for this PR.

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - a great amount of tests and elegant way to solve it.

}
let mut latest_limit: Option<usize> = None;
let mut latest_max = 0;
let mut ctx = TraverseState::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can / should move the code to LimitPushdown instead?
Might be possible to do it in a single pass and simplify the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also of two minds on this. I think it's easier to reason about, debug, and disable as a separate rule, but then interplay between lots of optimizer rules can be really difficult to debug as well.

@Dandandan Dandandan added this pull request to the merge queue Oct 14, 2025
Merged via the queue into apache:main with commit 4e69241 Oct 14, 2025
28 checks passed
@avantgardnerio
Copy link
Contributor Author

@alamb what's the process for turning this into a patch-fix?

@alamb
Copy link
Contributor

alamb commented Oct 14, 2025

@alamb what's the process for turning this into a patch-fix?

If you mean releasing as part of the DataFusion 50.x line, it is to follow this process: #17299 (comment)

Seems like @hareshkh may also be interested in helping so perhaps you can coordinate together

@hareshkh
Copy link
Contributor

@alamb @avantgardnerio: Creating a Datafusion release issue now, will cherry-pick changes after that.

@alamb
Copy link
Contributor

alamb commented Oct 15, 2025

@hareshkh has created a ticket to track the 50.3.0 release:

@avantgardnerio can you please create a backport PR to branch-50?

avantgardnerio added a commit to coralogix/arrow-datafusion that referenced this pull request Oct 17, 2025
* Add test

* Use ROWS instead of RANGE

* Fix a test

* progress

* window.slt like master

* passing existing tests

* Break out window limit tests

* LimitEffect

* fix a bug

* repartitions

* refactor

* refactor

* fmt

* remove casual

* two phased approach

* refactor into context

* refactor

* refactor

* refactor

* remove comments

* remove deps

* Fix NthValue

* aggregates

* ranking functions

* More tests

* Max lead test

* More tests, JIC

* More tests, JIC

* Notes

* Notes--

(cherry picked from commit 4e69241)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate ffi Changes to the ffi crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LimitPushPastWindows returns incorrect results for queries with lead()

5 participants