-
Notifications
You must be signed in to change notification settings - Fork 57
Create table in REST catalog #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
sfc-gh-okalaci
wants to merge
6
commits into
onder/register_namespace_on_rest_table_create
from
onder/stage_table_in_polaris
Closed
Create table in REST catalog #47
sfc-gh-okalaci
wants to merge
6
commits into
onder/register_namespace_on_rest_table_create
from
onder/stage_table_in_polaris
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We have Polaris running as part of regression tests. Polaris 1.2 is released, and let's try to be up-to-date with Polaris. There is one relatively important change that we are impacted by: ``` With version 1.2, creating or altering a namespace with a custom location outside its parent location is now prohibited by default. To restore the old behavior, you can set the ALLOW_NAMESPACE_CUSTOM_LOCATION flag to true. ``` We used to set the namepsace location by ourselves. Instead, let's rely on Polaris, which by default uses: ``` <warehouse>/<namespace>/<table>/ ``` And, that's what we look for anyway. The `warehouse` is `database name` for our purposes.
| * TODO: Should we make this configurable? Some object stores may require | ||
| * different headers or authentication methods. | ||
| */ | ||
| char *vendedCreds = pstrdup("X-Iceberg-Access-Delegation: vended-credentials"); |
Collaborator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we use vended credentials?
Collaborator
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, but I didn't want to end up trying to plug-in later on
9e67f43 to
02fcb9a
Compare
22 tasks
Collaborator
Author
This was referenced Nov 17, 2025
sfc-gh-okalaci
added a commit
that referenced
this pull request
Dec 3, 2025
In this PR, we add support for writable REST catalog tables. The API may still change, so I refrain documenting it in the PR description. Once we finalise the APIs, the best way would be to put under https://github.com/Snowflake-Labs/pg_lake/blob/main/docs/iceberg-tables.md, so stay tuned for that. In the earlier set of PRs (#47, #49, #51, #52 and #56) we prototyped adding support for writable rest catalog in different stages. However, it turns out that a single PR is better to tackle these very much related commits. It seemed overkill to maintain these set of PRs. This new table type shares almost the same architecture with iceberg tables `catalog=postgres`. #### Similarities - Tables are tracked the `pg_lake` catalogs such as `lake_table.files`, `lake_table.data_file_column_stats` and `lake_iceberg.tables_internal`. - All metadata handling follows `ApplyIcebergMetadataChanges()` logic. Instead of generating a new `metadata.json` as we do for `catalog=postgres`, for these tables we collect the changes happened in the transaction, and apply to the REST catalog right after it is committed in Postgres. #### Differences - The `metadata_location` column in `lake_iceberg.tables_internal` is always `NULL` - Does not support RENAME TABLE / SET SCHEMA etc. #### Some other notes on the implementation & design: - We first `COMMIT` in Postgres, then in `post-commit` hook, send a `POST` request to REST catalog. So, it is possible that the changes are committed in Postgres, but not in REST catalog. This is a known limitation, and we'll have follow-up PRs to make sure we can recover from this situation. - Creating a table and modifying it in the same Postgres transaction cannot be committed atomically in REST catalog. There is no such API in REST catalog. So, there are some additional error scenarios where table creation committed in REST catalog, say not the full CTAS. This is an unfortunate limitation that we inherit from REST catalog APIs. - Our implementation currently assumes that the Postgres is the single-writer to this table in the REST catalog. So, a concurrent modification breaks the table from Postgres side. For now, this is the current state. We plan to improve it in the future. #### TODO: - [x] `DROP partition_by` is not working (fixed by #79) - [x] Concurrency - [x] Certain DDLs do not work (e.g., ADD COLUMN with defaults), prevent much earlier - [x] VACUUM regression tests - [x] VACUUM failures (e.g., do we clean up properly?) - [x] VACUUM (ICEBERG) - [x] auto-vacuum test - [x] Truncate test - [x] savepoint - [x] Complex TX test - [x] Column names with quotes - [x] Add column + add partition by + drop column in the same tx - [x] Tests for read from postgres / iceberg, modify REST (or the other way around) - [x] Zero column table? - [x] DROP TABLE implemented, but needs tests (e.g., create - drop in the same tx, drop table removes the metadata from rest catalog etc). - [x] `SET partition_by` to an already existing partition by is not supported in Polaris. We should skip sending such requests, instead only send `set partition_spec` alone. (fixed by #79) - [ ] Recovery after failures (e.g., re-sync the previous snapshot/DDL) [Follow-up PR needed] - [x] Cache access token, currently we fetch on every REST request interaction [Follow-up PR needed] - [x] Cancel query - [x] sequences / serial / generated columns etc. - [x] All data types - [ ] Docs [Follow-up PR needed]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a draft PR that doesn't consider failure scenarios thoroughly.
Implement
create tablein REST catalog. The idea is to use/v1/{prefix}/namespaces/{namespace}/tablesend-point to create a table.We follow 2 step approach: In the first step, we
stage-createthe table, then we doassert-createin post-commit.The reason for this approach two stage approach is to be able to vended credentials in case of CTAS.