feat/3324-adding-iceberg-catalogs-compatibilities #3325
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR removes the hardcoded in-memory SQLite catalog limitation and enables DLT to work with all PyIceberg-supported catalog types including REST catalogs, through leveraging the
load_catalogfunctionality. The actual execution and operation of Iceberg format remains the same, we just load the catalog and pass them along to the pipeline.Previously, the Iceberg destination hardcoded catalog creation to use
sqlite:///:memory:, limiting Iceberg support:Leverage PyIceberg's built-in
load_catalog()function to support all catalog types through standard configuration methods. The implementation provides a priority chain that tries multiple configuration sources and falls back gracefully.In essence, this allows the user to add a
.pyiceberg.yamlfile to reference to their catalogs (which is in line with pyiceberg), similar to this:The changes I made do not require the user to do much more than that, and they can continue using Iceberg as they were.
Related Issues
Additional Context
To test it I run this for some time, monkey patching the core package. I then also added my contributions, built the package and tried it again through two of my prod pipelines.
As for the tests, I run them locally, but I came across some what I think are broken tests -- because they had absolutely nothing to do with my changes, but I am gonna be checking them more in detail tomorrow.