Icebird is a library for reading Apache Iceberg tables in JavaScript. It is built on top of hyparquet for reading the underlying parquet files.
To read an Iceberg table:
const { icebergRead } = await import('icebird')
const tableUrl = 'https://s3.amazonaws.com/hyperparam-iceberg/spark/bunnies'
const data = await icebergRead({
tableUrl,
rowStart: 0,
rowEnd: 10,
})
To read the Iceberg metadata (schema, etc):
import { icebergMetadata } from 'icebird'
const metadata = await icebergMetadata({ tableUrl })
// subsequent reads will be faster if you provide the metadata:
const data = await icebergRead({
tableUrl,
metadata,
})
To fetch a previous version of the table, you can specify metadataFileName
:
import { icebergRead } from 'icebird'
const data = await icebergRead({
tableUrl,
metadataFileName: 'v1.metadata.json',
})
You can add authentication to all http requests by passing a requestInit
argument that will be passed to fetch
:
import { icebergRead } from 'icebird'
const data = await icebergRead({
tableUrl,
requestInit: {
headers: {
Authorization: 'Bearer my_token',
},
}
})
Icebird aims to support reading any Iceberg table, but currently only supports a subset of the features. The following features are supported:
Feature | Supported |
---|---|
Read Iceberg v1 Tables | ✅ |
Read Iceberg v2 Tables | ✅ |
Read Iceberg v3 Tables | ❌ |
Parquet Storage | ✅ |
Avro Storage | ✅ |
ORC Storage | ❌ |
Puffin Storage | ❌ |
File-based Catalog (version-hint.text) | ✅ |
REST Catalog | ❌ |
Hive Catalog | ❌ |
Glue Catalog | ❌ |
Service-based Catalog | ❌ |
Position Deletes | ✅ |
Equality Deletes | ✅ |
Binary Deletion Vectors | ❌ |
Rename Columns | ✅ |
Efficient Partitioned Read Queries | ❌ |
All Parquet Compression Codecs | ✅ |
All Parquet Types | ✅ |
Variant Types | ❌ |
Geometry Types | ❌ |
Geography Types | ❌ |
Sorting | ❌ |
Encryption | ❌ |