Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg/storage: storage implementation from iceberg s3 #203

Open
laskoviymishka opened this issue Feb 6, 2025 · 0 comments · May be fixed by #208
Open

Iceberg/storage: storage implementation from iceberg s3 #203

laskoviymishka opened this issue Feb 6, 2025 · 0 comments · May be fixed by #208
Labels
enhancement New feature or request

Comments

@laskoviymishka
Copy link
Contributor

laskoviymishka commented Feb 6, 2025

Implement Iceberg Storage

Feature Request

Develop an Iceberg Storage, focusing on reading data from Iceberg data-files.
As reference implementation took Delta.


Scope and Goals

List table from iceberg catalog
Infer schema and convert it to transfer internal type system.
Read data convert arrow records to Change Item-s.
Basic e2e tests with test-container for s3.

🚫 Replication and CDC handling are out of scope.
🚫 No sharding and performance tune in this phase.


Implementation Details

  • Read format:

    • Utilize iceberg-go official library for meta catalogs
    • Apache arrow reader for data files reads
  • Type System Support:

    • Primitives - mapped to transfer type system
    • Structural type - mapped to any
  • Configuration Options:

    • Separate settings for catalogue type + catalog URI
    • Everything else as generic iceberg.Props map[string]string, this will support any settings out of icebergs out of the box
  • Testing:

    • Basic tests should be in place
    • Docker compose e2e tests inside CI
    • Docker compose with for local tests run

References & Inspiration

@laskoviymishka laskoviymishka added the enhancement New feature or request label Feb 6, 2025
laskoviymishka added a commit that referenced this issue Feb 7, 2025
Implement storage for iceberg, ability to list tables, estimate rows count and read data from it.
There is docker-compose recipe in place for e2e tests, this tests run inside CI.

Closes: #203
@laskoviymishka laskoviymishka linked a pull request Feb 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant