Skip to content

Conversation

@lfagliano
Copy link

Description

This PR removes the hardcoded in-memory SQLite catalog limitation and enables DLT to work with all PyIceberg-supported catalog types including REST catalogs, through leveraging the load_catalog functionality. The actual execution and operation of Iceberg format remains the same, we just load the catalog and pass them along to the pipeline.

Previously, the Iceberg destination hardcoded catalog creation to use sqlite:///:memory:, limiting Iceberg support:

catalog = get_sql_catalog(
    catalog_name or "default", 
    "sqlite:///:memory:",  # No other options!
    self.config.credentials
)

Leverage PyIceberg's built-in load_catalog() function to support all catalog types through standard configuration methods. The implementation provides a priority chain that tries multiple configuration sources and falls back gracefully.

In essence, this allows the user to add a .pyiceberg.yaml file to reference to their catalogs (which is in line with pyiceberg), similar to this:

# ~/.pyiceberg.yaml
catalog:
  production:
    type: rest
    uri: https://localhost:8181/catalog
    warehouse: analytics

The changes I made do not require the user to do much more than that, and they can continue using Iceberg as they were.

Related Issues

Additional Context

To test it I run this for some time, monkey patching the core package. I then also added my contributions, built the package and tried it again through two of my prod pipelines.

As for the tests, I run them locally, but I came across some what I think are broken tests -- because they had absolutely nothing to do with my changes, but I am gonna be checking them more in detail tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg destination only supports in-memory SQLite catalog - blocks REST and production catalogs

1 participant