-
Notifications
You must be signed in to change notification settings - Fork 57
Support data file pruning for external iceberg tables #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aykut/read-only-refactor
Are you sure you want to change the base?
Support data file pruning for external iceberg tables #70
Conversation
| #define PG_LAKE_ICEBERG_SERVER_NAME "pg_lake_iceberg" | ||
|
|
||
| extern PGDLLEXPORT bool IsAnyLakeForeignTableById(Oid foreignTableId); | ||
| extern PGDLLEXPORT char *GetForeignTablePath(Oid foreignTableId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved from other location
sfc-gh-okalaci
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, interesting and useful PR!
I think we should be able to split this PR into two:
- Refactors, new utility functions, no logic change PR
- The PR that adds the external partition pruning
The first one can be useful to have before we ship #68 such that the flow follows that from start. We can discuss the details of the APIs in that PR?
a140bb6 to
0b4bbca
Compare
| { | ||
| DataFileSchema *schema = GetDataFileSchemaForTable(relationId); | ||
|
|
||
| transform->sourceField = GetDataFileSchemaFieldById(schema, specField->source_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can source field exist at an old schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We disable dropping columns if they exist in any partition spec like below:
postgres=# create table test(a int) using iceberg;
CREATE TABLE
postgres=# ALTER TABLE test OPTIONS (add partition_by 'bucket(2,a)');
ALTER TABLE
postgres=# alter table test add column b int;
ALTER TABLE
postgres=# alter table test drop column a;
ERROR: cannot drop column "a" from table test because it is used in the partition specWe cant handle such flow for external tables with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should skip such partition spec fields anyway. Those do not exist in the current postgres schema and pruning wont apply them anyway.
|
|
||
| DataFileSchemaField *schemaField = GetDataFileSchemaFieldById(schema, fieldId); | ||
|
|
||
| char *attrName = pstrdup(schemaField->name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attribute name is safe for read only external tables
a33ede2 to
407d6b6
Compare
4226886 to
f01087b
Compare
407d6b6 to
90f7ecc
Compare
f01087b to
014a78f
Compare
90f7ecc to
8b40701
Compare
3721417 to
375ef82
Compare
8b40701 to
22c7548
Compare
375ef82 to
2973ec2
Compare
22c7548 to
8b9897a
Compare
de3f2a5 to
35b125c
Compare
| CREATE TABLE test_data_file_pruning.{table_name}_object_store_read_only () | ||
| USING iceberg WITH (catalog='object_store', read_only=True, catalog_table_name='{table_name}_object_store'); | ||
| -- CREATE TABLE test_data_file_pruning.{table_name}_rest_read_only () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not push into rest catalog yet, this will work after we support it.
| CREATE TABLE test_data_file_pruning.{table_name}_object_store_read_only () | ||
| USING iceberg WITH (catalog='object_store', read_only=True, catalog_table_name='{table_name}_object_store'); | ||
| -- CREATE TABLE test_data_file_pruning.{table_name}_rest_read_only () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not push into rest catalog yet, this will work after we support it.
35b125c to
295affc
Compare
1b05eb3 to
e3858c7
Compare
295affc to
dfb54fa
Compare
563b36c to
1e5f0b7
Compare
dfb54fa to
425c7a8
Compare
1e5f0b7 to
aa9de74
Compare
Signed-off-by: Aykut Bozkurt <[email protected]>
425c7a8 to
69a2265
Compare
Closes #55.