Skip to content

Iceberg destination only supports in-memory SQLite catalog - blocks REST and production catalogsย #3324

@lfagliano

Description

@lfagliano

Feature description

I would like to add support for other Iceberg catalogs, rather than just the SQLite, to the OSS version of DLT. I believe this is not a complicated addition, as we can leverage the existing pyiceberg functionalities to load catalog configs from the .pyiceberg.yaml and load_catalog. This allows the rest of the implementation to continue as normal.

Are you a dlt user?

Yes, I run dlt in production.

Use case

Yes! So far DLT is awesome, but the integration with Iceberg catalogs is not full (understandably as this is offered in DLT+). But while I understand that in DLT+ we may have much better features on top of this, I think the option to allow to connect to more catalogs for the OSS is a key feature to allow DLT to be a top-tier ingestion framework (which already is, but this would be like top tier ++++++ ๐Ÿ˜„ )

Proposed solution

I am preparing a PR leveraging the load_catalog function in pyiceberg. All the rest largely remains the same, we just leverage load_catalog to find the catalog, and return it to the pipeline as normal. I added other elements such as constructing the catalog config when required, but in essence, this is the focus.

Right now I am using this approach by monkey-patching DLT before running the pipeline. I am currently running tests and making some corrections to the pr.

Related issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions