-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Time Travel in InspectTable.entries #599
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -3253,6 +3253,15 @@ def __init__(self, tbl: Table) -> None: | |||
except ModuleNotFoundError as e: | ||||
raise ModuleNotFoundError("For metadata operations PyArrow needs to be installed") from e | ||||
|
||||
def _snapshot(self, snapshot_id: Optional[int] = None) -> Optional[Snapshot]: | ||||
if snapshot_id: | ||||
sungwy marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
if snapshot := self.tbl.metadata.snapshot_by_id(snapshot_id): | ||||
return snapshot | ||||
else: | ||||
raise ValueError(f"Cannot find snapshot with ID {snapshot_id}") | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: if the return value is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @kevinjqliu thanks for the review. I thought about this, and I stand by this behavior / type annotation. This is my rationale:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I believe the above is because a newly created table isn't required to have a snapshot There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 make sense, thanks for the explanation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, and thank you again for the review! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that we should raise an error when the snapshot cannot be found. What do you tink of updating the signature to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Fokko Sure - I thought it would be more correct to return an empty metadata table (entries, partitions, etc) if there's no snapshot in the table than raising an Exception, but this way I think we avoid extra |
||||
|
||||
return self.tbl.metadata.current_snapshot() | ||||
|
||||
def snapshots(self) -> "pa.Table": | ||||
import pyarrow as pa | ||||
|
||||
|
@@ -3287,7 +3296,7 @@ def snapshots(self) -> "pa.Table": | |||
schema=snapshots_schema, | ||||
) | ||||
|
||||
def entries(self) -> "pa.Table": | ||||
def entries(self, snapshot_id: Optional[int] = None) -> "pa.Table": | ||||
import pyarrow as pa | ||||
|
||||
from pyiceberg.io.pyarrow import schema_to_pyarrow | ||||
|
@@ -3346,7 +3355,7 @@ def _readable_metrics_struct(bound_type: PrimitiveType) -> pa.StructType: | |||
]) | ||||
|
||||
entries = [] | ||||
if snapshot := self.tbl.metadata.current_snapshot(): | ||||
if snapshot := self._snapshot(snapshot_id): | ||||
for manifest in snapshot.manifests(self.tbl.io): | ||||
for entry in manifest.fetch_manifest_entry(io=self.tbl.io): | ||||
column_sizes = entry.data_file.column_sizes or {} | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we think of a better name that describes the function's behavior? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't think of a better one... is this a bit better than _snapshot? ^^;