-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Feat: replace sort order #1500
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Generally lgtm, added a few nit comments
@@ -404,6 +405,20 @@ def update_schema(self, allow_incompatible_changes: bool = False, case_sensitive | |||
name_mapping=self.table_metadata.name_mapping(), | |||
) | |||
|
|||
def replace_sort_order(self, case_sensitive: bool = True) -> ReplaceSortOrder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wdyt about update_sort_order
instead?
from the spec,
A data or delete file is associated with a sort order by the sort order's id within [a manifest](https://iceberg.apache.org/spec/#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes.
All sort orders are kept in the manifest in the sort-orders
field.
SortField(source_id=1, transform=IdentityTransform(), direction=SortDirection.ASC, null_order=NullOrder.NULLS_FIRST), | ||
SortField(source_id=2, transform=IdentityTransform(), direction=SortDirection.DESC, null_order=NullOrder.NULLS_LAST), | ||
order_id=1, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for modifying sort order of a table with existing sort order?
from pyiceberg.table import Transaction | ||
|
||
|
||
class SortOrderBuilder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need the builder here? Or just follow UpdateSchema's pattern
Thanks for the comments 🙌 @kevinjqliu . Will look at your comments this weekend. |
This PR adds functionality to replace a table's sort order. Closes #1245
Some basic tests are implemented but need to be expanded.
Currently, a new sort order ID is assigned. This adds a sort order to the list of available sort orders rather than replacing it.
@Fokko do you mind taking a look at this and telling me what you think? Thanks in advance!