🧊 Realtime Lakehouse Stack

End-to-end open-source data platform — from operational Postgres to actionable insights in Metabase, powered by Debezium, Apache Iceberg, Trino, and LakeKeeper.

🚀 Overview

realtime-lakehouse-stack is a production-style data architecture built entirely with open-source components.
It demonstrates how organizations can move from raw operational data to real-time analytics using modern data lakehouse technologies.

This stack simulates a complete data flow:

Postgres (TimescaleDB) — operational database capturing business transactions
Debezium — streams real-time changes from Postgres
MinIO + Apache Iceberg — scalable data lakehouse for structured, versioned data
LakeKeeper — metadata and governance for Iceberg tables
Trino — distributed SQL query engine for fast, interactive analysis
Metabase — self-service BI dashboards for decision-makers

🧱 Architecture at a Glance

Component	Role in the Platform	Business Translation
TimescaleDB (Postgres)	Operational data storage	Records all business operations reliably
Debezium	Real-time change capture	Automatically updates dashboards without manual ETL
MinIO + Iceberg	Structured data lake	Central repository of clean, queryable, scalable data
LakeKeeper	Metadata/catalog management	Teams always know what data exists and how to access it safely
Trino	Distributed query engine	Analysts and BI tools get fast access to large datasets
Metabase	BI dashboards	Converts data into actionable insights for executives

💼 Why It Matters for Organizations

Modern companies need real-time decision-making based on operational data — not nightly batch ETL.
This stack provides a blueprint for building your own in-house, cost-efficient, and open-source data lakehouse that can scale with business needs.

Use it to:

Prototype modern data architectures
Train data teams on streaming + lakehouse concepts
Showcase end-to-end integration for consultancy or client pitches

🏗️ Quick Start

git clone https://github.com/yourusername/realtime-lakehouse-stack.git
cd realtime-lakehouse-stack
docker compose up -d

Once started:

Metabase → http://localhost:3000
Trino UI → http://localhost:8080
MinIO Console → http://localhost:9001
LakeKeeper API → http://localhost:8181

🤖 Data Transformation: From Raw Data to Business Insights with dbt

This is where the raw data from our operational systems is transformed into clean, reliable, and business-ready datasets. We use dbt (Data Build Tool), the industry standard for data transformation, to build our Bronze, Silver, and Gold data layers.

The dbt project is located in the warehouse directory and is pre-configured to connect to our Trino query engine, which reads from the Iceberg data lake.

Running the dbt Models

Follow these steps to run the transformations that build our silver and gold tables:

Navigate to the dbt project directory: This is the heart of our transformation logic.
```
cd warehouse
```
Run the dbt transformations: The following command executes all the dbt models. It uses the Python virtual environment (.venv) where dbt is installed and tells dbt to use the local profiles.yml for the Trino connection.
```
../.venv/bin/dbt run --profiles-dir .
```
You will see dbt connect to Trino and build the enriched_orders (Silver) and daily_sales (Gold) tables in your Iceberg data lake.

Pro-tip: If you have the project's virtual environment activated in your shell (source ../.venv/bin/activate), you can simply run dbt run --profiles-dir ..

This process is the key to turning the real-time change data captured by Debezium into actionable insights in Metabase.

🧪 Coming Next

Airflow orchestration example

⭐ Support the Project

If you find this stack helpful, please ⭐ star the repository and share it on LinkedIn!
Your support helps promote open-source data platforms and real-time analytics innovation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
infra-setup		infra-setup
warehouse		warehouse
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧊 Realtime Lakehouse Stack

🚀 Overview

🧱 Architecture at a Glance

💼 Why It Matters for Organizations

🏗️ Quick Start

🤖 Data Transformation: From Raw Data to Business Insights with dbt

Running the dbt Models

🧪 Coming Next

⭐ Support the Project

About

Uh oh!

Releases

Packages

Languages

SiddiqueAhmad/realtime-lakehouse-stack

Folders and files

Latest commit

History

Repository files navigation

🧊 Realtime Lakehouse Stack

🚀 Overview

🧱 Architecture at a Glance

💼 Why It Matters for Organizations

🏗️ Quick Start

🤖 Data Transformation: From Raw Data to Business Insights with dbt

Running the dbt Models

🧪 Coming Next

⭐ Support the Project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages