-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support scan nested type(struct, map, list) #882
Conversation
17b7645
to
d744a4d
Compare
This is looking great, especially now that we have this really comprehensive integration test. Just those two small typos to fix and I'll happily approve - thanks! |
Thanks @sdd! I have fixed the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks
b04342a
to
c44311a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; | ||
pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; | ||
/// UTC time zone for Arrow timestamp type. | ||
pub const UTC_TIME_ZONE: &str = "+00:00"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also required to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. When users provide the timestamp data, they should set the time zone consistent with the iceberg. I think we can provide something to help user fill the metadata later.🤔
crates/iceberg/src/spec/datatypes.rs
Outdated
@@ -226,8 +228,10 @@ pub enum PrimitiveType { | |||
/// Timestamp in microsecond precision, with timezone | |||
Timestamptz, | |||
/// Timestamp in nanosecond precision, without timezone | |||
#[serde(rename = "timestamp_ns")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, but why is it related with reading complex type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scan_all_type.rs find this bug and I fix it here. I can separate it out of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to do it in another pr with some tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have separate out this to #905. Let's merge this first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! @Xuanwo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some more test cases? I think we are missing handling the case in #405
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @liurenjie1024, could you elaborate which part this PR miss? This PR is not intent to complete #405. It only support nest type but not the projected nested filed of structs. |
crates/iceberg/src/scan.rs
Outdated
let field = schema | ||
.as_struct() | ||
.field_by_id(field_id) | ||
.ok_or_else(|| { | ||
Error::new( | ||
ErrorKind::FeatureUnsupported, | ||
format!( | ||
"Column {} is not a direct child of schema but a nested field, which is not supported now. Schema: {}", | ||
column_name, schema | ||
), | ||
) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @ZENOTME If we only want to support nested types without supporting deeply nested types, we can't remove this check.
pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; | ||
pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value"; | ||
/// UTC time zone for Arrow timestamp type. | ||
pub const UTC_TIME_ZONE: &str = "+00:00"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also required to be public?
c44311a
to
cc0de5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @ZENOTME for this great pr!
This PR support to scan nested type