Skip to content

Handle union schema name coercion #16064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LiaCastaneda
Copy link

@LiaCastaneda LiaCastaneda commented May 16, 2025

Which issue does this PR close?

Rationale for this change

Physical planning is failing on the uppermost projection due to inaccurate schema name coercion while building the union physical node UnionExec (see issue for wider explanation)

What changes are included in this PR?

A workaround fix in union_schema by keeping the field names of the first input + a integration test with a reproducer.

Are these changes tested?

yes, a test was added in substrait_consumer tests

Are there any user-facing changes?

no

@github-actions github-actions bot added the substrait Changes to the substrait crate label May 16, 2025
@LiaCastaneda LiaCastaneda marked this pull request as ready for review May 16, 2025 13:47
@LiaCastaneda LiaCastaneda changed the title Fix union schema name coercion Handle union schema name coercion May 16, 2025
#[tokio::test]
async fn test_multiple_unions() -> Result<()> {
let plan_str = test_plan_to_string("multiple_unions.json").await?;
assert_eq!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using assert_snapshot! here? seems to be a more up-to-date approach for snapshot testing

Comment on lines -535 to -536
// We can unwrap this because if inputs was empty, this would've already panic'ed when we
// indexed into inputs[0].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that this comment is still relevant right? we might want to keep it.

@@ -513,7 +513,10 @@ fn union_schema(inputs: &[Arc<dyn ExecutionPlan>]) -> SchemaRef {

let fields = (0..first_schema.fields().len())
.map(|i| {
inputs
let base_field = first_schema.field(i).clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a comment about why is important to keep the name of the column from the first schema?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Input field name $f3 does not match with the projection expression ...
2 participants