[branch-52] Bump DataFusion rev to f62e79d (skip ensure_distribution rebuild)#47
Merged
Merged
Conversation
…rebuild) Picks up massive-com/arrow-datafusion#57 (cherry-pick of upstream apache/datafusion#22521): ensure_distribution now routes through with_new_children_if_necessary, skipping the expensive plan.with_new_children() rebuild when children are pointer-identical. Atlas's reference query servers need this DF rev to consume the optimization end-to-end. MV crate has to bump in lockstep otherwise atlas ends up with two different DF copies in the workspace (MV depending on 05a6c45, atlas depending on f62e79d), causing type mismatches across the MV/atlas boundary. All 25 + 1 tests pass against the new rev.
There was a problem hiding this comment.
Pull request overview
Bumps the pinned git revision of the DataFusion dependency set used by datafusion-materialized-views to pick up an upstream optimization that avoids unnecessary plan rebuilds in ensure_distribution, helping reduce overhead for plan shapes where no redistribution is needed.
Changes:
- Update DataFusion git rev from
05a6c45tof62e79dacross all directly pinneddatafusion*crates.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bump DataFusion rev
05a6c45→f62e79dto pick up massive-com/arrow-datafusion#57, the cherry-pick of upstream apache/datafusion#22521.The upstream change makes
ensure_distributionroute throughwith_new_children_if_necessary, which short-circuits viaArc::ptr_eqwhen no child plan was actually replaced. Saves the (often expensive)with_new_childrenrebuild on point-query plan shapes where no redistribution is needed.Why MV needs to move in lockstep
Atlas pins both the MV crate AND DF separately:
datafusion-materialized-views = { rev = "906d9cf" }← this PR will produce a new revdatafusion = { rev = "..." }← atlas will bump tof62e79donce this landsIf the two crates pin different DF revs, cargo resolves them as two separate DF copies in the workspace, which breaks atlas's MV/DF boundary (types don't unify). So MV has to land first.
Tests
cargo test— 25/25 unit + 1/1 integration + 4 ignored doctests pass against the new rev.