-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add_files
support partitioned tables
#531
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syun64 Thanks for working on this, this looks great!
Thank you very much for the detailed review @Fokko . I've adopted all of your review comments 👍 - I would appreciate another round of review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks again for the work @syun64
Thank you! As always! @Fokko |
As a follow up to #506, this PR introduces the support for adding files as DataFiles to partitioned tables.
Instead of relying on the more inaccurate method of parsing and inferring partition values from the file path relying on a Hive partitioning scheme, this approach requires that the partition values are there in the parquet files, and infers the partition values from the partition metadata footer by taking using the lower and upper bound values.
The optimization to use the lower bound and upper bound values prevents the client from having to read the entire parquet file as it is able to use the aggregated statistics from the parquet metadata footer. As a result, this implementation of add_files does not support tables with partition transforms that are non-linear (not
preserves_order
).Among the existing Transforms, the following Transform partitions are supported:
The following are not: