Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Strict projection #539

Merged
merged 3 commits into from
Mar 25, 2024
Merged

Add Strict projection #539

merged 3 commits into from
Mar 25, 2024

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 21, 2024

Can be used for detecting when manifests are part of a delete operation.

Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks @Fokko ! Just some tiny comments.

pyiceberg/expressions/visitors.py Outdated Show resolved Hide resolved
elif isinstance(pred, BoundSetPredicate):
return pred.as_unbound(Reference(name), pred.literals)
else:
raise ValueError(f"Could not project: {pred}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit confused here: why do we only raise error in IdentityTransform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. It is code that's not being hit since we can always project identity transforms. I've changed it to return None.

@Fokko Fokko merged commit f1c2ef2 into apache:main Mar 25, 2024
7 checks passed
@Fokko Fokko deleted the fd-strict-projection branch March 25, 2024 12:08
@Fokko Fokko mentioned this pull request Mar 27, 2024
elif isinstance(pred, BoundGreaterThan):
return GreaterThan(Reference(name), _transform_literal(transform, boundary))
elif isinstance(pred, BoundGreaterThanOrEqual):
return GreaterThan(Reference(name), _transform_literal(transform, boundary.decrement())) # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm confused about decrement here. @Fokko
E.g. col1 is date type and transform is month, for the predicate is col1 >= 1970-01-02, this will cause the predicate project to month(col1) > -1, e.g. the col1 is 1970-01-01, 1970-01-01 >= 1970-01-02 is false but
month(1970-01-01) = 0 > -1 is true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you might be onto something here:

_test_projection(
transform.strict_project(name="name", pred=BoundGreaterThanOrEqual(term=bound_reference_date, literal=date)),
GreaterThan(term="name", literal=LongLiteral(-1)), # In Java this is human string 1970-01
)

https://github.com/apache/iceberg/blob/d402f83fc7b224b21242c506cf503e5bcbc8c867/api/src/test/java/org/apache/iceberg/transforms/TestDatesProjection.java#L139-L140

Luckily this code is not yet used, I'll write a patch tomorrow 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ZENOTME Again, thanks for pointing this out. It turns out that most of these oddities are caused by a bug that was part of Iceberg ≤0.10.0. We did not port this to PyIceberg: https://github.com/apache/iceberg/blob/ac6509a4e469f808bebb8b713a5c4213f98ff4a5/api/src/main/java/org/apache/iceberg/transforms/ProjectionUtil.java#L275

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a PR to refactor the tests: #1422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants