-
Notifications
You must be signed in to change notification settings - Fork 260
feat: Support TimestampNs
and TimestampTzNs` in bucket transform
#1150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for this pr, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -167,6 +167,16 @@ impl TransformFunction for Bucket { | |||
.downcast_ref::<arrow_array::TimestampMicrosecondArray>() | |||
.unwrap() | |||
.unary(|v| self.bucket_timestamp(v)), | |||
DataType::Time64(TimeUnit::Nanosecond) => input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in pyiceberg, TimestampNanoType
in converted to micros precision before bucketing. This is to ensure that TimestampType
and TimestampNanoType
are transformed to the same value
I think we should add a test to ensure this here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, It wasn't performing this conversion. The test is added
@Fokko FYI new transform function we can use for apache/iceberg-python#1833 |
54d001b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
.as_any() | ||
.downcast_ref::<arrow_array::Time64NanosecondArray>() | ||
.unwrap() | ||
.unary(|v| self.bucket_time(v / 1000)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was an issue with rounding when performing division for the hour transform (#1146)
but since we're dividing by 1000
, i dont think the same issue applies here
.transform_literal(&Datum::timestamp_nanos(ns_value)) | ||
.unwrap() | ||
.unwrap(), | ||
Datum::int(79) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, since we're using the same time (1510871468000000
) here
should the bucket value for bucket(100)
transform be the same for this test and the test above (test_timestamptz_literal
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I applied the conversion to the transform_literal
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @jonathanc-n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for adding the additional checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for fixing this, and @kevinjqliu for the review
Which issue does this PR close?
What changes are included in this PR?
Add bucket transforms for
TimestampNs
and TimestampTzNs`Are these changes tested?
Unit tests