Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(puffin): Add Puffin crate and CompressionCodec #745

Merged
merged 1 commit into from
Dec 6, 2024

Conversation

fqaiser94
Copy link
Contributor

@fqaiser94 fqaiser94 commented Nov 30, 2024

Part of #744

Summary

Context

  • This is the first of a number of PRs to add support for Iceberg Puffin file format.
  • It might be helpful to refer to the overarching PR from which these changes were split to understand better how these changes will fit in to the larger picture.

Out of Scope

  • Support for LZ4 compression/decompression
    • The Java library does not currently support this either. As the Java implementations are the reference implementations in the Iceberg ecosystem, I would like implement LZ4 support on the Java side first before implementing on the Rust side. At that point, we can also easily check the Java and Rust generated outputs are bit-wise identical.

@fqaiser94 fqaiser94 mentioned this pull request Nov 30, 2024
8 tasks
@fqaiser94 fqaiser94 force-pushed the puffin_crate_and_compression branch from 8f1ae48 to 3eb4fc5 Compare November 30, 2024 22:43
@fqaiser94 fqaiser94 force-pushed the puffin_crate_and_compression branch from 3eb4fc5 to 336df2b Compare November 30, 2024 22:46
@fqaiser94 fqaiser94 marked this pull request as ready for review November 30, 2024 23:01
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me :) Thanks for working on this @fqaiser94 🙌

cc @liurenjie1024 @Xuanwo

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @fqaiser94 for working on this, and also thank you @Fokko for the review. Looks like a good start point for me. Let's move!

@Xuanwo Xuanwo merged commit bcf28b4 into apache:main Dec 6, 2024
17 checks passed
@fqaiser94 fqaiser94 deleted the puffin_crate_and_compression branch December 6, 2024 13:21
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fqaiser94 for this pr, just left a comment, others LGTM!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to put this outside of iceberg crate? I'm thinking how will we deal with PuffinReader/PuffindWriter, which will depend on FileIO?

Copy link
Contributor Author

@fqaiser94 fqaiser94 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking how will we deal with PuffinReader/PuffindWriter, which will depend on FileIO?

I don't understand why this would cause issues since FileIO is a publicly exposed interface? 🤔

Do we really need to put this outside of iceberg crate?

That said, I think you're right.
I don't see any compelling reasons to separate this functionality into a separate crate outside of the iceberg crate.
I didn't consider this option; I just forgot 😅
Here's a quick follow-up PR to move this functionality inside of the existing iceberg crate: #789

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants