Regression: DataFrame::schema
returns incorrect schema for NATURAL JOIN
#14058
Labels
bug
Something isn't working
help wanted
Extra attention is needed
regression
Something that used to work no longer does
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Affected Version: 42.x, 43.x, 44.x (regression since 41.x)
The
DataFrame::schema
(=>LogicalPlan::schema
) method returns a schema that includes all columns from the joined sources (usingNATURAL JOIN
), including columns not present in the final output. This behavior is incorrect and inconsistent with the documented behavior:To Reproduce
Simple MRE here:
Deps:
Expected behavior
The schema returned by
DataFrame::schema
should match the structure of the output produced bycollect
/collect_partitioned
and etc. Specifically:Or, if it was intended - the documentation should be aligned and be clear how to access the schema.
However, I find previous behavior correct and useful (e.g - get schema before methods like
write_parquet
/csv
/json
)Additional context
This is a regression, as the method previously worked correctly in version 41.x.x and earlier.
Also, it probably points to the missing test coverage for particular code-paths. In a sense it's not enough to compare SQL execution results
The text was updated successfully, but these errors were encountered: