-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support merge manifests on writes (MergeAppend) #363
Merged
Merged
Changes from 11 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
6803eba
add ListPacker + tests
HonahX f0fc260
add merge append
HonahX cbb8cec
add merge_append
HonahX bf63c03
fix snapshot inheritance
HonahX 9dd69af
test manifest file and entries
HonahX 4921a7f
add doc
HonahX 984ca41
fix lint
HonahX 8510f71
change test name
HonahX 1ec5edd
Merge branch 'main' into manifest_compaction
HonahX a7da318
address review comments
HonahX c4feda5
rename _MergingSnapshotProducer to _SnapshotProducer
HonahX 7e6e1d4
Merge branch 'main' into manifest_compaction
HonahX 123f5d3
Merge branch 'main' into manifest_compaction
HonahX 9777e9b
fix a serious bug
HonahX 3393757
Merge branch 'main' into manifest_compaction
HonahX 66dddbe
update the doc
HonahX aff1bea
remove merge_append as public API
HonahX 7625857
make default to false
HonahX 3e3a1b4
add test description
HonahX 914d6ef
Merge branch 'main' into manifest_compaction
HonahX 71a5fe0
fix merge conflict
HonahX c7e4095
fix snapshot_id issue
HonahX File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -273,6 +273,10 @@ tbl.append(df) | |
|
||
# or | ||
|
||
tbl.merge_append(df) | ||
|
||
# or | ||
|
||
tbl.overwrite(df) | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reluctant to expose this to the public API for a couple of reasons:
How about having something similar as in Java, to control this using a table property: https://iceberg.apache.org/docs/1.5.2/configuration/#table-behavior-properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! I am also +1 on let it controlled by the config. I made
merge_append
a separate API to mirror the Java side implementation, which hasnewAppend
andnewFastAppend
APIs. But it seems better to just make thecommit.manifest-merge.enabled
default toFalse
on python side.I will still keep FastAppend and MergeAppend as separate class, and keep
merge_append
inUpdateSnapshot
class to ensure clarity, although the current MergeAppend is purely FastAppend + manifest merge.Just curious, why not Java side
newAppend
return anFastAppend
impl whencommit.manifest-merge.enabled
is False. Is it due to some backward compatibiilty issue?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think the use-case of the Java library is slightly different, since that's mostly used in query engines.
I think it is for historical reasons, since the fast-append was added later on :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, I like how you split it out in classes, it is much cleaner now 👍