Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Feat: replace sort order #1500

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

JasperHG90
Copy link

This PR adds functionality to replace a table's sort order. Closes #1245

Some basic tests are implemented but need to be expanded.

Currently, a new sort order ID is assigned. This adds a sort order to the list of available sort orders rather than replacing it.

@Fokko do you mind taking a look at this and telling me what you think? Thanks in advance!

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Generally lgtm, added a few nit comments

@@ -404,6 +405,20 @@ def update_schema(self, allow_incompatible_changes: bool = False, case_sensitive
name_mapping=self.table_metadata.name_mapping(),
)

def replace_sort_order(self, case_sensitive: bool = True) -> ReplaceSortOrder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wdyt about update_sort_order instead?

from the spec,

A data or delete file is associated with a sort order by the sort order's id within [a manifest](https://iceberg.apache.org/spec/#manifests). Therefore, the table must declare all the sort orders for lookup. A table could also be configured with a default sort order id, indicating how the new data should be sorted by default. Writers should use this default sort order to sort the data on write, but are not required to if the default order is prohibitively expensive, as it would be for streaming writes.

All sort orders are kept in the manifest in the sort-orders field.

SortField(source_id=1, transform=IdentityTransform(), direction=SortDirection.ASC, null_order=NullOrder.NULLS_FIRST),
SortField(source_id=2, transform=IdentityTransform(), direction=SortDirection.DESC, null_order=NullOrder.NULLS_LAST),
order_id=1,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for modifying sort order of a table with existing sort order?

from pyiceberg.table import Transaction


class SortOrderBuilder:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need the builder here? Or just follow UpdateSchema's pattern

@JasperHG90
Copy link
Author

Thanks for the comments 🙌 @kevinjqliu . Will look at your comments this weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Support update table's sort order
2 participants