-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations
#18806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support planning subqueries with OuterReferenceColumn belongs to non-adjacent outer relations
#18806
Conversation
OuterReferenceColumn belongs to non-adjacent outer relations attemp 2
OuterReferenceColumn belongs to non-adjacent outer relations attemp 2OuterReferenceColumn belongs to non-adjacent outer relations attemp 2
|
Thanks @duongcongtoai and @duongcongtoai It turns out I got a ping yesterday from Andy Pavlo and his student @yliang412 about the status of subquery support) I just filed a ticket to track just the planning aspect of this query (which I think this PR handles) |
|
I will try and give this a look shortly |
OuterReferenceColumn belongs to non-adjacent outer relations attemp 2OuterReferenceColumn belongs to non-adjacent outer relations
|
|
||
| /// The queries schemas of outer query relations, used to resolve the outer referenced | ||
| /// columns in subquery (recursive aware) | ||
| outer_queries_schemas_stack: Vec<DFSchemaRef>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is the key change -- that rather than having a single outer query schema, we need to have a stack, one for each in the relation.
However it seems like we shouldn't need both outer_query_schema as well a stack of them -- outer_query_schemas should always be enough -- maybe you can add push and pop methods that add/remove from the schema stack
One thought I had while looking at this PR is that a more logical structure would be for each PlannerContext to have a reference to its parent -- something like
pub struct PlannerContext<'a> {
/// When planning a subquery, the context of the outer query
parent_context: Option<&'a PlannerContext>
...
}And remove the explicit outer_query_schema
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @duongcongtoai and @kosiew
Our friends at CMU were just asking about this so it was great to see a PR already started
I left some comments. My biggest ones are:
- Can we find a way to unify the schema stacks (rather than having both a field and a stack)?
- Can we restrict this PR to just planning (aka no changes to the optimizer or other stages) -- maybe with tests focused only on planning, such as
datafusion/datafusion/sql/tests/sql_integration.rs
Lines 4380 to 4398 in 4e0161d
#[test] fn test_ambiguous_column_references_with_in_using_join() { let sql = "select p1.id, p1.age, p2.id from person as p1 INNER JOIN person as p2 using(id)"; let plan = logical_plan(sql).unwrap(); assert_snapshot!( plan, @r" Projection: p1.id, p1.age, p2.id Inner Join: Using p1.id = p2.id SubqueryAlias: p1 TableScan: person SubqueryAlias: p2 TableScan: person " ); }

Which issue does this PR close?
Original PR/discussions: #16186
There were some discussion going on regarding handle subquery with depth aware at the planning stage, which is a nice thing to have, but until we implement something like that, we cannot continue implement query decorrelation. But i realize that we can add some minor change to how we plan the subqueries so at least no error is thrown because of ambiguous schema as in #15558:
PlannerContextmaintains an optional outer schema, we just need to replace this field with a stack of outer schemaRationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?