-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(puffin): Add PuffinReader #892
base: main
Are you sure you want to change the base?
Conversation
31c248b
to
4c97569
Compare
4c97569
to
62a2e6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fqaiser94 for this great pr, generally LGTM! Just some minor suggestions.
} | ||
|
||
/// Returns file metadata | ||
pub(crate) async fn file_metadata(&mut self) -> Result<&FileMetadata> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider using a LazyLock
to initalize it? This way it could be thread safe, and discard the &mut
modifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to be able to get rid of the &mut
here 👍
Struggling to get a LazyLock
based implementation working though.
The main challenge is that FileMetadata::read
is an async function :/
Open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think async_lazy crate is a good fit?
/// Sequence number of the Iceberg table's snapshot the blob was computed from | ||
pub(crate) sequence_number: i64, | ||
/// The actual blob data | ||
pub(crate) data: Vec<u8>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add comment to explain that this is always uncompressed data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can we include a CompressionCodec
in the Blob
to indicate the compress method? (we can do this in a follow-up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add comment to explain that this is always uncompressed data?
Done.
Or can we include a CompressionCodec in the Blob to indicate the compress method? (we can do this in a follow-up PR)
I can see where you're coming from and while I'm not completely opposed to the idea, I think I prefer to return the uncompressed data at this point. I will also note that the Java PuffinReader API does the same i.e. always returns the data after decompression.
62a2e6d
to
4d64762
Compare
4d64762
to
2216a8b
Compare
Part of #744
Summary
Context