-
Notifications
You must be signed in to change notification settings - Fork 56
Writable Iceberg tables with REST catalog support #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
02fcb9a to
0f1c010
Compare
64203e0 to
7f200e8
Compare
Especially useful for #68
Especially useful for #68
ac8f08e to
9e7183a
Compare
0f1c010 to
2978252
Compare
Especially useful for #68
9e7183a to
eed829c
Compare
Especially useful for #68
Especially useful for #68
eed829c to
6c9cee7
Compare
Especially useful for #68
Especially useful for #68
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ```
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ```
6c9cee7 to
1e80069
Compare
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ``` On Polaris, before this commit, we'd get the following, given Polaris thinks this spec already exists: ``` alter table t (set partition_by 'bucket(10,a)'); WARNING: HTTP request failed (HTTP 400) DETAIL: Cannot set last added spec: no spec has been added HINT: The rest catalog returned error type: ValidationException ``` Signed-off-by: Onder KALACI <[email protected]>
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ``` On Polaris, before this commit, we'd get the following, given Polaris thinks this spec already exists: ``` alter table t (set partition_by 'bucket(10,a)'); WARNING: HTTP request failed (HTTP 400) DETAIL: Cannot set last added spec: no spec has been added HINT: The rest catalog returned error type: ValidationException ``` Signed-off-by: Onder KALACI <[email protected]> (cherry picked from commit 75b4eda)
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ``` On Polaris, before this commit, we'd get the following, given Polaris thinks this spec already exists: ``` alter table t (set partition_by 'bucket(10,a)'); WARNING: HTTP request failed (HTTP 400) DETAIL: Cannot set last added spec: no spec has been added HINT: The rest catalog returned error type: ValidationException ``` Signed-off-by: Onder KALACI <[email protected]>
Both Spark and Polaris follows this. Before this commit, we blindly added partition specs, even if the same spec existed before. Now, we are switching to a more proper way: if the same spec exists in the table earlier, we simply use that instead of adding one more spec. This is especially useful for #68, where Polaris throws an error if we try to send a partition spec that already exists. Here `the same spec` means a spec that has the exact same fields with an existing spec in the table. To give an example, previously we created 3 specs for the following, now 2: ```sql ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); .. ALTER TABLE t OPTION (ADD partition_by='bucket(20,a)'); .. -- back to 10, this does not generate a new spec anymore ALTER TABLE t OPTION (ADD partition_by='bucket(10,a)'); ``` On Polaris, before this commit, we'd get the following, given Polaris thinks this spec already exists: ``` alter table t (set partition_by 'bucket(10,a)'); WARNING: HTTP request failed (HTTP 400) DETAIL: Cannot set last added spec: no spec has been added HINT: The rest catalog returned error type: ValidationException ``` Signed-off-by: Onder KALACI <[email protected]>
1d626ca to
3cf0d5e
Compare
| } | ||
|
|
||
| static bool | ||
| AlterTableAddColumnDefaultForRestTable(Oid relationId, AlterTableCmd *cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be something like IsAlterTableAddColumnDefaultForRestTable?
| if (renameStmt->renameType == OBJECT_SCHEMA) | ||
| { | ||
| /* we are in post-process, use new name */ | ||
| ErrorIfAnyRestCatalogTablesInSchema(renameStmt->newname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we guard this by extension existing? otherwise we might break postgres
we normally escape from that by checking FDW names and such.
|
|
||
| while ((requestPerTable = hash_seq_search(&status)) != NULL) | ||
| { | ||
| /* TODO: can we ever have multiple catalogs? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an unreasonable assumption, but one we can ignore for now, we could one day end up wanting tables with custom endpoints. (probably not specifically catalog)
| } | ||
|
|
||
| void | ||
| PostAllRestCatalogRequests(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to add some comments here, especially that this function should never error
sfc-gh-mslot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good.
I like being able to do things like create table test_rest using iceberg with (catalog = 'rest') as select 5; and then query using other tools.
| { | ||
| bool forUpdate = false; | ||
| char *columnName = "metadata_location"; | ||
| char *columnName = "table_name"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably variable name below should also change then
| requestPerTable->relationId = relationId; | ||
|
|
||
| requestPerTable->catalogName = MemoryContextStrdup(TopTransactionContext, GetRestCatalogName(relationId)); | ||
| requestPerTable->catalogNamespace = MemoryContextStrdup(TopTransactionContext, GetRestCatalogNamespace(relationId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor OOM hazards here in subtransactions maybe, might be good to add an isValid flag.
In this PR, we add support for writable REST catalog tables. The API may still change, so I refrain documenting it in the PR description. Once we finalise the APIs, the best way would be to put under https://github.com/Snowflake-Labs/pg_lake/blob/main/docs/iceberg-tables.md, so stay tuned for that.
In the earlier set of PRs (#47, #49, #51, #52 and #56) we prototyped adding support for writable rest catalog in different stages. However, it turns out that a single PR is better to tackle these very much related commits. It seemed overkill to maintain these set of PRs.
This new table type shares almost the same architecture with iceberg tables
catalog=postgres.Similarities
pg_lakecatalogs such aslake_table.files,lake_table.data_file_column_statsandlake_iceberg.tables_internal.ApplyIcebergMetadataChanges()logic. Instead of generating a newmetadata.jsonas we do forcatalog=postgres, for these tables we collect the changes happened in the transaction, and apply to the REST catalog right after it is committed in Postgres.Differences
metadata_locationcolumn inlake_iceberg.tables_internalis alwaysNULLSome other notes on the implementation & design:
COMMITin Postgres, then inpost-commithook, send aPOSTrequest to REST catalog. So, it is possible that the changes are committed in Postgres, but not in REST catalog. This is a known limitation, and we'll have follow-up PRs to make sure we can recover from this situation.TODO:
DROP partition_byis not working (fixed by Do not add a partition spec if it already exists #79)SET partition_byto an already existing partition by is not supported in Polaris. We should skip sending such requests, instead only sendset partition_specalone. (fixed by Do not add a partition spec if it already exists #79)