-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abort the whole table transaction if any updates in the transaction has failed #1246
abort the whole table transaction if any updates in the transaction has failed #1246
Conversation
Thanks for unblocking the testing actions! |
Thanks for the PR @stevie9868. This sounds like an important bug to address. Do you know if this bug only applies to the PS I reran the CI |
pyiceberg/table/__init__.py
Outdated
) -> None: | ||
"""Close and commit the transaction, or handle exceptions.""" | ||
# Only commit the full transaction, if there is no exception in all updates on the chain | ||
if exctb is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between exctype
, excinst
, and exctb
here? Why do we use exctb
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this part. Typically __exit__
is called as the last step for the with
statement.
Here, __exit__
calls the self.commit_transaction()
which will process the transactions.
It seems like the issue here is to catch partial exceptions from the self.commit_transaction()
which wont be caught here
Im a bit confused on the chain of events. Here's what I found digging through the code:
iceberg-python/pyiceberg/table/__init__.py Lines 1044 to 1045 in de976fe
In the transaction's iceberg-python/pyiceberg/table/__init__.py Lines 507 to 516 in de976fe
iceberg-python/pyiceberg/table/__init__.py Lines 594 to 600 in de976fe
and self.update_snapshot(snapshot_properties=snapshot_properties).fast_append() also creates a UpdateSnapshot (_FastAppendFiles ).iceberg-python/pyiceberg/table/__init__.py Lines 594 to 600 in de976fe
Both iceberg-python/pyiceberg/table/update/__init__.py Lines 62 to 70 in de976fe
iceberg-python/pyiceberg/table/update/snapshot.py Lines 241 to 279 in de976fe
At this point, nothing has been committed yet. All updates are queued up in the transaction. |
Please let me know if the above makes sense |
Ah, do you have iceberg-python/pyiceberg/table/__init__.py Lines 260 to 261 in de976fe
This seems like a potential footgun. Perhaps we should get rid of |
@kevinjqliu I believe if self.update_snapshot(snapshot_properties=snapshot_properties).fast_append throws an exception, it will still trigger the transaction.exit, which will have the commit_transaction then only contain 1 update, which is the delete in this case as the append failed to be added into the updates list For example, if the fast_append() failed any operation during the commit, (in our case, we see aws s3 exception), then the exception will propagate back to the transaction.exit. Let me know if I miss anything, thanks, and I also prefer getting rid of _autocommit in the Transaction class |
Thank you, I think this would potentially apply to all functions that triy to combine more than one update into one transaction. |
ah, I don't think we set _autocommit to true |
@stevie9868 @kevinjqliu Thanks for the great PR and discussions! I agree that there is some issue with the current Transaction mechanism: the The following pattern has been the most common practice of updating tables in pyiceberg since the beginning with tbl.transaction() as txn:
txn.overwrite(...)
.... The "with" statement will ensure that the context manager---Transaction object's A simpler example would be: pa_table_with_column = pa.Table.from_pydict(
{
"foo": ["a", None, "z"],
"bar": [19, None, 25],
},
schema=pa.schema([
pa.field("foo", pa.large_string(), nullable=True),
pa.field("bar", pa.int32(), nullable=True),
]),
)
tbl = catalog.create_table(identifier=identifier, schema=pa_table_with_column.schema)
with pytest.raises(ValueError):
with tbl.transaction() as txn:
txn.append(pa_table_with_column)
raise ValueError
txn.append(pa_table_with_column)
assert len(tbl.scan().to_pandas()) == 0 Since I explicitly raise an error during the transaction, the whole transaction should be abandoned. But this code block still insert 3 rows (first append) to the table. Please let me know if these make sense. Would love to hear your thoughts on this! |
Thanks for providing a detailed example, and I agree that we should only call commit_transaction when there is no exception along the way. |
The iceberg-python/pyiceberg/table/__init__.py Lines 991 to 1006 in de976fe
iceberg-python/pyiceberg/table/__init__.py Lines 1077 to 1078 in de976fe
iceberg-python/pyiceberg/table/__init__.py Lines 976 to 989 in de976fe
The idea is to make the code simpler if we only want to evolve schema/spec/... with table.update_schema() as update:
update.add_column("some_field", IntegerType(), "doc") instead of another with..transaction wrapper with table.transaction() as transaction:
with transaction.update_schema() as update_schema:
update.add_column("some_other_field", IntegerType(), "doc") Since the recommended way to start a transaction is txn = tbl.transaction() , this option in general is not exposed to user directly: #471 (comment) However, there may still be some concerns around this since Please let me know what you think! |
ah, I think the parameter autocommit is private in Transaction class. Having a doc is a good first step, and I believe currently autocommit=true will be applied to ManagedSnapshot, UpdateSpec, and UpdateSchema. Also correct me if I am wrong, I believe the current java iceberg library doesn't have the auto_commit option in the Transaction class? |
34ca959
to
f7a7a87
Compare
I have also updated the PR based on existing comments, and thanks everyone for the inputs! |
pyiceberg/table/__init__.py
Outdated
"""Close and commit the transaction, or handle exceptions.""" | ||
# Only commit the full transaction, if there is no exception in all updates on the chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Close and commit the transaction, or handle exceptions.""" | |
# Only commit the full transaction, if there is no exception in all updates on the chain | |
"""Close and commit the transaction if no exceptions have been raised.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exit the runtime context related to this object. The parameters describe the exception that caused the context to be exited. If the context was exited without an exception, all three arguments will be None.
From https://docs.python.org/3/reference/datamodel.html#object.__exit__
tests/catalog/test_base.py
Outdated
@@ -766,3 +766,26 @@ def test_table_properties_raise_for_none_value(catalog: InMemoryCatalog) -> None | |||
with pytest.raises(ValidationError) as exc_info: | |||
_ = given_catalog_has_a_table(catalog, properties=property_with_none) | |||
assert "None type is not a supported value in properties: property_name" in str(exc_info.value) | |||
|
|||
|
|||
def test_abort_table_transaction_on_exception(catalog: InMemoryCatalog) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this test be moved to tests/table/test_init.py
instead? it doesn't really belong in the "catalog"
tests/catalog/test_base.py
Outdated
# Populate some initial data | ||
data = pa.Table.from_pylist( | ||
[{"x": 1, "y": 2, "z": 3}, {"x": 4, "y": 5, "z": 6}], | ||
schema=TEST_TABLE_SCHEMA.as_arrow(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can just use the test fixture arrow_table_with_null: pa.Table
like so
spark: SparkSession, session_catalog: Catalog, arrow_table_with_null: pa.Table, format_version: int |
tests/catalog/test_base.py
Outdated
with pytest.raises(ValueError): | ||
with tbl.transaction() as txn: | ||
txn.overwrite(data) | ||
raise ValueError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with pytest.raises(ValueError): | |
with tbl.transaction() as txn: | |
txn.overwrite(data) | |
raise ValueError | |
with pytest.raises(ValueError): | |
with tbl.transaction() as txn: | |
txn.overwrite(data) | |
raise ValueError | |
txn.overwrite(data) |
maybe another call after the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops yea, honah already mentioned this
Thanks @HonahX @stevie9868! Glad we were able to get to the bottom of this important correctness issue. I started #1253 to continue the conversation on |
4f6faa9
to
945d61d
Compare
I have decided to move the test under
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing the previous comments.
I left a few nit comments on making the test more readable
Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres a weird linting error, can you try running make lint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thank you @stevie9868 for discovering and fixing this important correctness bug! |
Thanks @kevinjqliu @HonahX for the guidance and the quick review, really appreciate it! |
…as failed (apache#1246) * abort the whole transaction if any update on the chain has failed * Update tests/integration/test_writes/test_writes.py Co-authored-by: Kevin Liu <[email protected]> * Update tests/integration/test_writes/test_writes.py Co-authored-by: Kevin Liu <[email protected]> * add type:ignore to prevent lint error --------- Co-authored-by: Yingjian Wu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
…as failed (apache#1246) * abort the whole transaction if any update on the chain has failed * Update tests/integration/test_writes/test_writes.py Co-authored-by: Kevin Liu <[email protected]> * Update tests/integration/test_writes/test_writes.py Co-authored-by: Kevin Liu <[email protected]> * add type:ignore to prevent lint error --------- Co-authored-by: Yingjian Wu <[email protected]> Co-authored-by: Kevin Liu <[email protected]>
We have encountered a data loss issue when using pyIceberg to perform an overwrite operation. Typically, an overwrite operation involves creating both a delete snapshot and an append snapshot. However, if an exception occurs during the creation of the append snapshot, the current code still attempts to commit the delete snapshot, leading to potential data loss. One thing to note is this does not apply to only overwrite but potentially other operations as well.
To address this issue, we need to ensure that the entire transaction is aborted if any part of the update process fails.
Also provided a simple test case, where before this change, the transaction will only contains a delete snapshot update deleting the data. Whereas after this fix, we still keep the same data before the partially failed transaction since the whole transaction is now aborted.