Embucket: Snowflake compatible lakehouse platform

Embucket is an Apache‑2.0‑licensed, Snowflake‑compatible lakehouse platform built for radical simplicity and full openness. It delivers:

A Snowflake‑style REST API and SQL dialect
Apache Iceberg table format under the hood
Iceberg REST API for both internal and external engines
Zero‑disk, object‑store‑only architecture (S3 or memory)
Statically linked single‑binary for effortless deployment
SlateDB for metadata persistence
Apache DataFusion as the query engine

Features

Snowflake compatible API
- Snowflake SQL syntax dialect
- Snowflake v1 wire compatible REST API
Apache Iceberg
- Data stored in Iceberg format on object storage
- Built-in internal catalog
- Exposes Iceberg REST Catalog API
Zero disk architecture
- All state (data + metadata) lives in your s3 buckets
- No other dependencies required
Scalable "query-per-node" parallelism
- Spin up multiple Embucket instances against the same bucket
- Each node handles queries independently for horizontal scale
Single statically linked binary
- One embucket executable with zero external dependencies
Iceberg catalog federation
- Connect to external Iceberg REST catalogs
- Read/write across catalogs

Architecture

Embucket is designed with radical simplicity in mind: it is a single binary that runs as a server and provides a REST API for interacting with the lakehouse. It has single dependency - object storage - for both data and metadata.

It is built on top of several open source projects:

Apache DataFusion - query engine
Apache Iceberg - data storage
SlateDB - metadata persistence

Embucket has deep integration with AWS S3 table buckets and relies on them for proper table maintenance.

SLT coverage

These visualizations are automatically updated by CI/CD when tests are run.

DBT Gitlab run results:

This visualization is automatically updated by CI/CD daily

Install Embucket

# Clone and build the Embucket binary
git clone [email protected]:Embucket/embucket.git
cd embucket/
cargo build

Configure and run Embucket

You can configure Embucket via CLI arguments or environment variables:

# Create a .env configuration file
cat << EOF > .env
# SlateDB storage settings
OBJECT_STORE_BACKEND=memory
FILE_STORAGE_PATH=data
SLATEDB_PREFIX=sdb

# Optional: AWS S3 storage (leave blank if using local storage)
AWS_ACCESS_KEY_ID="<your_aws_access_key_id>"
AWS_SECRET_ACCESS_KEY="<your_aws_secret_access_key>"
AWS_REGION="<your_aws_region>"
S3_BUCKET="<your_s3_bucket>"
S3_ALLOW_HTTP=

# Iceberg Catalog settings
# Set to your catalog url
CATALOG_URL=http://127.0.0.1:3000

# Optional: CORS settings
CORS_ENABLED=true
CORS_ALLOW_ORIGIN=http://localhost:8080

EOF

# Load environment variables (optional)
export $(grep -v '^#' .env | xargs)

# Start Embucket
./target/debug/bucketd

Once embucket is running, open:

localhost:8080 → UI Dashboard
localhost:3000/catalog → Iceberg REST Catalog API

Demo: running dbt project with Embucket

This demo showcases how to use Embucket with dbt and execute the snowplow_web dbt project, treating Embucket as a Snowflake-compatible database.

Prerequisites

python 3.9+ installed
virtualenv installed

Run dbt workflow

# Clone the dbt project with Snowplow package installed
git clone [email protected]:Embucket/compatibility-test-suite.git
cd compatibility-test-suite/dbt-snowplow/

# Set up a virtual environment and install dependencies
virtualenv .venv
source .venv/bin/activate
pip install dbt-core dbt-snowflake

# Activate virtual environment
source .venv/bin/activate

# Set Snowflake-like environment variables
export SNOWFLAKE_USER=user
export SNOWFLAKE_PASSWORD=xxx
export SNOWFLAKE_DB=snowplow
export SNOWFLAKE_SCHEMA=public
export SNOWFLAKE_WAREHOUSE=snowplow

# Install the dbt Snowplow package
dbt deps

# Upload source data
python3 upload.py

# Upload initial data
dbt seed

# Run dbt transformations
dbt run -m snowplow_web

Debugging

Check TRACING_AND_PROFILING.md for information on how to trace and profile embucket service.

Contributing

We welcome contributions! To get involved:

Fork the repository on GitHub
Create a new branch for your feature or bug fix
Submit a pull request with a detailed description

For more details, see CONTRIBUTING.md.

License

This project is licensed under the Apache 2.0 License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 667 Commits
.cargo		.cargo
.github		.github
crates		crates
test		test
ui		ui
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CODING_CONVENTIONS.md		CODING_CONVENTIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
architecture.png		architecture.png
clippy.toml		clippy.toml
deny.toml		deny.toml
rest-catalog-open-api.yaml		rest-catalog-open-api.yaml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embucket: Snowflake compatible lakehouse platform

Features

Architecture

SLT coverage

DBT Gitlab run results:

Install Embucket

Configure and run Embucket

Demo: running dbt project with Embucket

Prerequisites

Run dbt workflow

Debugging

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 19

Uh oh!

Languages

License

Embucket/embucket

Folders and files

Latest commit

History

Repository files navigation

Embucket: Snowflake compatible lakehouse platform

Features

Architecture

SLT coverage

DBT Gitlab run results:

Install Embucket

Configure and run Embucket

Demo: running dbt project with Embucket

Prerequisites

Run dbt workflow

Debugging

Contributing

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 19

Uh oh!

Languages

Packages