Skip to content

hyparam/icebird

Repository files navigation

Icebird: JavaScript Iceberg Reader

Iceberg Icebird

npm minzipped workflow status mit license coverage

Icebird is a library for reading Apache Iceberg tables in JavaScript. It is built on top of hyparquet for reading the underlying parquet files.

Usage

To read an Iceberg table:

const { icebergRead } = await import('icebird')

const tableUrl = 'https://s3.amazonaws.com/hyperparam-iceberg/spark/bunnies'
const data = await icebergRead({
  tableUrl,
  rowStart: 0,
  rowEnd: 10,
})

To read the Iceberg metadata (schema, etc):

import { icebergMetadata } from 'icebird'

const metadata = await icebergMetadata({ tableUrl })

// subsequent reads will be faster if you provide the metadata:
const data = await icebergRead({
  tableUrl,
  metadata,
})

Time Travel

To fetch a previous version of the table, you can specify metadataFileName:

import { icebergRead } from 'icebird'

const data = await icebergRead({
  tableUrl,
  metadataFileName: 'v1.metadata.json',
})

Authentication

You can add authentication to all http requests by passing a requestInit argument that will be passed to fetch:

import { icebergRead } from 'icebird'

const data = await icebergRead({
  tableUrl,
  requestInit: {
    headers: {
      Authorization: 'Bearer my_token',
    },
  }
})

Supported Features

Icebird aims to support reading any Iceberg table, but currently only supports a subset of the features. The following features are supported:

Feature Supported
Read Iceberg v1 Tables
Read Iceberg v2 Tables
Read Iceberg v3 Tables
Parquet Storage
Avro Storage
ORC Storage
Puffin Storage
File-based Catalog (version-hint.text)
REST Catalog
Hive Catalog
Glue Catalog
Service-based Catalog
Position Deletes
Equality Deletes
Binary Deletion Vectors
Rename Columns
Efficient Partitioned Read Queries
All Parquet Compression Codecs
All Parquet Types
Variant Types
Geometry Types
Geography Types
Sorting
Encryption

References