-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Strict projection #539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks @Fokko ! Just some tiny comments.
pyiceberg/transforms.py
Outdated
elif isinstance(pred, BoundSetPredicate): | ||
return pred.as_unbound(Reference(name), pred.literals) | ||
else: | ||
raise ValueError(f"Could not project: {pred}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit confused here: why do we only raise error in IdentityTransform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It is code that's not being hit since we can always project identity transforms. I've changed it to return None.
Co-authored-by: Honah J. <[email protected]>
elif isinstance(pred, BoundGreaterThan): | ||
return GreaterThan(Reference(name), _transform_literal(transform, boundary)) | ||
elif isinstance(pred, BoundGreaterThanOrEqual): | ||
return GreaterThan(Reference(name), _transform_literal(transform, boundary.decrement())) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm confused about decrement here. @Fokko
E.g. col1 is date type and transform is month, for the predicate is col1 >= 1970-01-02
, this will cause the predicate project to month(col1) > -1
, e.g. the col1 is 1970-01-01
, 1970-01-01 >= 1970-01-02
is false but
month(1970-01-01) = 0 > -1
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you might be onto something here:
iceberg-python/tests/test_transforms.py
Lines 1006 to 1009 in 88c4bad
_test_projection( | |
transform.strict_project(name="name", pred=BoundGreaterThanOrEqual(term=bound_reference_date, literal=date)), | |
GreaterThan(term="name", literal=LongLiteral(-1)), # In Java this is human string 1970-01 | |
) |
Luckily this code is not yet used, I'll write a patch tomorrow 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ZENOTME Again, thanks for pointing this out. It turns out that most of these oddities are caused by a bug that was part of Iceberg ≤0.10.0
. We did not port this to PyIceberg: https://github.com/apache/iceberg/blob/ac6509a4e469f808bebb8b713a5c4213f98ff4a5/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a PR to refactor the tests: #1422
Can be used for detecting when manifests are part of a delete operation.