Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove initial_change when CreateTableTransaction apply table updates on an empty metadata #1219

Merged
merged 11 commits into from
Nov 1, 2024

Conversation

HonahX
Copy link
Contributor

@HonahX HonahX commented Oct 6, 2024

Fixes #864

Remove initial_change as described in Kevin's #950. This PR relies on Pydantic model's model_construct to create a model without validation. This gives us the flexibility to upgrade from TableMetadataV1 to TableMetadataV2 without worrying about validators terminating the process because of some incomplete fields.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR and adding the deprecation for initial_change.

I went down a deep rabbit hole on initial_change, the default values for TableMetadata, and how table updates are applied.
Let me clean up my notes and I'll post it here for reference

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @HonahX. I like this solution. I've verified using the original issue's environment and it works.

Do you mind also removing these instances of initial_change?

AddSchemaUpdate(schema_=schema, last_column_id=schema.highest_field_id, initial_change=True),
SetCurrentSchemaUpdate(schema_id=-1),
)
spec: PartitionSpec = table_metadata.spec()
if spec.is_unpartitioned():
self._updates += (AddPartitionSpecUpdate(spec=UNPARTITIONED_PARTITION_SPEC, initial_change=True),)
else:
self._updates += (AddPartitionSpecUpdate(spec=spec, initial_change=True),)
self._updates += (SetDefaultSpecUpdate(spec_id=-1),)
sort_order: Optional[SortOrder] = table_metadata.sort_order_by_id(table_metadata.default_sort_order_id)
if sort_order is None or sort_order.is_unsorted:
self._updates += (AddSortOrderUpdate(sort_order=UNSORTED_SORT_ORDER, initial_change=True),)
else:
self._updates += (AddSortOrderUpdate(sort_order=sort_order, initial_change=True),)

Also, should we remove the unrelated changes from this PR? such as the library version upgrade/etc.

dev/Dockerfile Outdated Show resolved Hide resolved
@kevinjqliu
Copy link
Contributor

I want to get this PR in before the next release, please let me know if there's anything I can do to move it along!

@HonahX HonahX changed the title Another way to remove initial_change when CreateTableTransaction apply table updates on an empty metadata Remove initial_change when CreateTableTransaction apply table updates on an empty metadata Oct 30, 2024
@HonahX HonahX marked this pull request as ready for review October 30, 2024 05:25
@HonahX
Copy link
Contributor Author

HonahX commented Oct 30, 2024

@kevinjqliu Thanks for reviewing this! I've updated the PR. It will be great to include this in the next release.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevinjqliu kevinjqliu requested review from Fokko and sungwy October 30, 2024 15:40
@kevinjqliu kevinjqliu added this to the PyIceberg 0.8.0 release milestone Oct 30, 2024
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for catching this @kevinjqliu and fixing this @HonahX

@@ -267,11 +285,10 @@ def _(
elif update.format_version == base_metadata.format_version:
return base_metadata

updated_metadata_data = copy(base_metadata.model_dump())
updated_metadata_data["format-version"] = update.format_version
updated_metadata = base_metadata.model_copy(update={"format_version": update.format_version})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -90,7 +90,13 @@ class AddSchemaUpdate(IcebergBaseModel):
# This field is required: https://github.com/apache/iceberg/pull/7445
last_column_id: int = Field(alias="last-column-id")

initial_change: bool = Field(default=False, exclude=True)
initial_change: bool = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clean to go through the validation cycle, but I would be surprised if anyone would be relying on these properties

@kevinjqliu kevinjqliu merged commit 3bdd458 into apache:main Nov 1, 2024
7 checks passed
@kevinjqliu
Copy link
Contributor

Thanks @HonahX for the fix and @Fokko for the review! One step closer to 0.8.0!

sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
…es on an empty metadata (apache#1219)

* make table metadata without validaiton

* update deletes test

* remove info

* add deprecation message

* revert lib version updates

* remove initial_changes usage in code

* move test to integration

* fix typo

* update error string
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
…es on an empty metadata (apache#1219)

* make table metadata without validaiton

* update deletes test

* remove info

* add deprecation message

* revert lib version updates

* remove initial_changes usage in code

* move test to integration

* fix typo

* update error string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[🐞] Collection of a few bugs
3 participants