You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The logic that coerces timestamps to a different resolution iterates through fields and uses their key in the Parquet schema as a key to match against the Arrow schema.
However, this is insufficient (consider a schema with structs each with an id field) to disambiguate nested fields. I think we might need a combination of Parquet ColumnDesc's path and a normalized Arrow schema to fix the mapping.
To Reproduce
CometFuzzTestSuite for INT96 reproduces the issue immediately. I will work on including an slt test in the fix PR.
Expected behavior
No response
Additional context
The text was updated successfully, but these errors were encountered:
mbutrovich
changed the title
Parquet: coerce_int96 does not work for int96 in nested types with repeated names
Parquet: coerce_int96 does not work for int96 in nested types, especially with repeated names
May 15, 2025
mbutrovich
changed the title
Parquet: coerce_int96 does not work for int96 in nested types, especially with repeated names
Parquet: coerce_int96 does not work for int96 in nested types
May 15, 2025
Describe the bug
The logic that coerces timestamps to a different resolution iterates through fields and uses their key in the Parquet schema as a key to match against the Arrow schema.
datafusion/datafusion/datasource-parquet/src/file_format.rs
Line 586 in 66a7423
However, this is insufficient (consider a schema with structs each with an
id
field) to disambiguate nested fields. I think we might need a combination of Parquet ColumnDesc's path and a normalized Arrow schema to fix the mapping.To Reproduce
CometFuzzTestSuite for INT96 reproduces the issue immediately. I will work on including an slt test in the fix PR.
Expected behavior
No response
Additional context
The text was updated successfully, but these errors were encountered: