Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support scan nested type(struct, map, list) #882

Merged
merged 4 commits into from
Jan 22, 2025

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Jan 7, 2025

This PR support to scan nested type

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Jan 7, 2025

cc @liurenjie1024 @Xuanwo @Fokko @sdd

@ZENOTME ZENOTME requested a review from sdd January 9, 2025 09:31
@sdd
Copy link
Contributor

sdd commented Jan 13, 2025

This is looking great, especially now that we have this really comprehensive integration test. Just those two small typos to fix and I'll happily approve - thanks!

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Jan 13, 2025

Thanks @sdd! I have fixed the name.

@ZENOTME ZENOTME requested a review from sdd January 13, 2025 08:35
sdd
sdd previously approved these changes Jan 13, 2025
Copy link
Contributor

@sdd sdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ZENOTME for this pr! I think it's a step moving forward, but I think this pr didn't handle nested struct type well, see #405

crates/iceberg/src/arrow/schema.rs Show resolved Hide resolved
pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value";
pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value";
/// UTC time zone for Arrow timestamp type.
pub const UTC_TIME_ZONE: &str = "+00:00";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also required to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When users provide the timestamp data, they should set the time zone consistent with the iceberg. I think we can provide something to help user fill the metadata later.🤔

crates/iceberg/src/spec/datatypes.rs Show resolved Hide resolved
@@ -226,8 +228,10 @@ pub enum PrimitiveType {
/// Timestamp in microsecond precision, with timezone
Timestamptz,
/// Timestamp in nanosecond precision, without timezone
#[serde(rename = "timestamp_ns")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but why is it related with reading complex type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan_all_type.rs find this bug and I fix it here. I can separate it out of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to do it in another pr with some tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have separate out this to #905. Let's merge this first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @Xuanwo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some more test cases? I think we are missing handling the case in #405

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Jan 20, 2025

I think it's a step moving forward, but I think this pr didn't handle nested struct type well, see #405

Hi @liurenjie1024, could you elaborate which part this PR miss? This PR is not intent to complete #405. It only support nest type but not the projected nested filed of structs.

Comment on lines 251 to 262
let field = schema
.as_struct()
.field_by_id(field_id)
.ok_or_else(|| {
Error::new(
ErrorKind::FeatureUnsupported,
format!(
"Column {} is not a direct child of schema but a nested field, which is not supported now. Schema: {}",
column_name, schema
),
)
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ZENOTME If we only want to support nested types without supporting deeply nested types, we can't remove this check.

pub(crate) const DEFAULT_MAP_FIELD_NAME: &str = "key_value";
pub const DEFAULT_MAP_FIELD_NAME: &str = "key_value";
/// UTC time zone for Arrow timestamp type.
pub const UTC_TIME_ZONE: &str = "+00:00";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also required to be public?

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @ZENOTME for this great pr!

@liurenjie1024 liurenjie1024 merged commit efca9f0 into apache:main Jan 22, 2025
18 checks passed
@ZENOTME ZENOTME deleted the support_nested branch January 22, 2025 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants