Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the InMemory Catalog Implementation #289
Improve the InMemory Catalog Implementation #289
Changes from 20 commits
066e8c4
b1a99f7
32c449e
3c6e06a
21b1e50
e2541ac
3013bdb
be3eb1c
f80849a
faea973
4437a30
aabcde0
b91e1cf
2834bae
341f5ba
67c028a
96ba8de
c7f9053
8a7b876
10adb1c
a466e5f
03ec82b
58b34ca
810c3c5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some
test_console.py
outputs are too long and end up with an extra\n
in the middle of the string, causing tests to failThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added ability to write to temporary files for testing, which is then automatically cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syun64 FYI, I realized that the
TEST_TABLE_PARTITION_SPEC
here breaks this test.The partition field's
source_id
here is1
, but increate_table
theschema
'sfield_id
s are all-1
due to_convert_schema_if_needed
So
assign_fresh_partition_spec_ids
failsiceberg-python/pyiceberg/partitioning.py
Lines 203 to 204 in 102e043
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kevinjqliu thank you for flagging this 😄 I think '-1' ID discrepancy is the symptom of the issue that makes the issue easy to understand, just as we decided in #305 (comment)
The root cause of the issue I think is that we are introducing a way for non-ID's schema (PyArrow Schema) to be used as an input into create_table, while not supporting the same for partition_spec and sort_order (PartitionField and SortField both require field IDs as inputs).
So I think we should update both assign_fresh_partition_spec_ids and assign_fresh_sort_order_ids to support field look up by name.
@Fokko - does that sound like a good way to resolve this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #338 to track this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @syun64 that for creating tables having to look up the IDs is not ideal. Probably that API has to be extended at some point.
But for the metadata (and also how Iceberg internally tracks columns, since names can change; IDs not), we need to track it by ID. I'm in doubt if assigning
-1
was the best idea because that will give you a table that you cannot work with. Thanks for creating the issue, and let's continue there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @Fokko 👍 and thanks again for flagging this @kevinjqliu !