-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove initial_change
when CreateTableTransaction apply table updates on an empty metadata
#1219
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR and adding the deprecation for initial_change
.
I went down a deep rabbit hole on initial_change
, the default values for TableMetadata
, and how table updates are applied.
Let me clean up my notes and I'll post it here for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @HonahX. I like this solution. I've verified using the original issue's environment and it works.
Do you mind also removing these instances of initial_change
?
iceberg-python/pyiceberg/table/__init__.py
Lines 706 to 721 in 9a6a9a1
AddSchemaUpdate(schema_=schema, last_column_id=schema.highest_field_id, initial_change=True), | |
SetCurrentSchemaUpdate(schema_id=-1), | |
) | |
spec: PartitionSpec = table_metadata.spec() | |
if spec.is_unpartitioned(): | |
self._updates += (AddPartitionSpecUpdate(spec=UNPARTITIONED_PARTITION_SPEC, initial_change=True),) | |
else: | |
self._updates += (AddPartitionSpecUpdate(spec=spec, initial_change=True),) | |
self._updates += (SetDefaultSpecUpdate(spec_id=-1),) | |
sort_order: Optional[SortOrder] = table_metadata.sort_order_by_id(table_metadata.default_sort_order_id) | |
if sort_order is None or sort_order.is_unsorted: | |
self._updates += (AddSortOrderUpdate(sort_order=UNSORTED_SORT_ORDER, initial_change=True),) | |
else: | |
self._updates += (AddSortOrderUpdate(sort_order=sort_order, initial_change=True),) |
Also, should we remove the unrelated changes from this PR? such as the library version upgrade/etc.
I want to get this PR in before the next release, please let me know if there's anything I can do to move it along! |
initial_change
when CreateTableTransaction apply table updates on an empty metadatainitial_change
when CreateTableTransaction apply table updates on an empty metadata
@kevinjqliu Thanks for reviewing this! I've updated the PR. It will be great to include this in the next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for catching this @kevinjqliu and fixing this @HonahX
@@ -267,11 +285,10 @@ def _( | |||
elif update.format_version == base_metadata.format_version: | |||
return base_metadata | |||
|
|||
updated_metadata_data = copy(base_metadata.model_dump()) | |||
updated_metadata_data["format-version"] = update.format_version | |||
updated_metadata = base_metadata.model_copy(update={"format_version": update.format_version}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -90,7 +90,13 @@ class AddSchemaUpdate(IcebergBaseModel): | |||
# This field is required: https://github.com/apache/iceberg/pull/7445 | |||
last_column_id: int = Field(alias="last-column-id") | |||
|
|||
initial_change: bool = Field(default=False, exclude=True) | |||
initial_change: bool = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clean to go through the validation cycle, but I would be surprised if anyone would be relying on these properties
…es on an empty metadata (apache#1219) * make table metadata without validaiton * update deletes test * remove info * add deprecation message * revert lib version updates * remove initial_changes usage in code * move test to integration * fix typo * update error string
…es on an empty metadata (apache#1219) * make table metadata without validaiton * update deletes test * remove info * add deprecation message * revert lib version updates * remove initial_changes usage in code * move test to integration * fix typo * update error string
Fixes #864
Remove
initial_change
as described in Kevin's #950. This PR relies on Pydantic model's model_construct to create a model without validation. This gives us the flexibility to upgrade from TableMetadataV1 to TableMetadataV2 without worrying about validators terminating the process because of some incomplete fields.