Skip to content

chocoapp/dataeng-dbt-colibri

 
 

Repository files navigation

dbt-colibri header

PyPI version Python Support License: MIT

A lightweight, developer-friendly CLI tool and self-hostable dashboard for extracting and visualizing column-level lineage from your dbt projects.

Built for data teams who want transparent, flexible lineage tracking without vendor lock-in or complex enterprise tooling.

Why dbt-colibri?

  • 🔍 Complete visibility: Easy UI, track how every column flows through your dbt transformations
  • ⚡ Fast & lightweight: Generate reports in seconds from your existing dbt artifacts
  • 🏠 Self-hosted: No cloud dependencies or external services required

Live demo of dashboard: https://demo.colibri-data.com/

Documentation site: https://www.colibri-data.com/docs

dbt-colibri dashboard

Quick Start

Installation

# Using uv (recommended)
uv add dbt-colibri

# Using pip
pip install dbt-colibri

Basic Usage

  1. Run dbt to generate the required artifacts:

    dbt compile
    dbt docs generate
  2. Generate lineage report:

    colibri generate
  3. View results: Open dist/index.html in your browser

That's it! Your column lineage dashboard is ready. Note you can also use dbt run/build, to generate the manifest.json.

Documentation

CLI Commands

colibri generate

Generates column lineage reports from your dbt project.

colibri generate [OPTIONS]

Options:

  • --manifest-path: Path to dbt manifest.json (default: target/manifest.json)
  • --catalog-path: Path to dbt catalog.json (default: target/catalog.json)
  • --output-dir: Output directory (default: dist/)
  • --help: Show help message
  • --light: For very big dbt projects, excludes attributes like compiled SQL and returns smaller HTML file.

Output Files

  • colibri-manifest.json: Lineage data
  • index.html: Interactive (standalone) visualization dashboard

Project Structure

your-dbt-project/
├── target/
│   ├── manifest.json    # Generated by dbt
│   └── catalog.json     # Generated by dbt docs generate
└── dist/                # Generated by colibri
    ├── index.html       # Interactive dashboard
    └── colibri-manifest.json

Advanced Usage

CI/CD Integration

The easiest way to deploy your static html is through github/gitlab pages (if you are on enterprise license you can do this privately)

You can find the full example workflow at docs/github_pages_example.yml.

General idea

  1. After every change to the production dbt code (push the main branch), GitHub Actions will:
    • Set up Python and install dependencies with uv.
    • Compile and generate docs needed for colibri.
    • Run colibri generate to build the static HTML report in the dist/ folder.
  2. The dist/ folder is uploaded as an artifact and deployed natively to GitHub Pages using the official actions/deploy-pages action.
  3. The result is available at your repository’s Pages URL.

Gitlab has similar functionality. Other options are writing the file to a bucket and mount it into a web server container (nginx).

Technical Details

Requirements

  • Python: tested on versions 3.9, 3.11, 3.13

  • Supported dbt Adapters:

    • Snowflake,
    • BigQuery,
    • Redshift,
    • duckDB,
    • Postgres
    • Databricks (limited to SQL models)
    • Athena
    • Trino

dbt Compatibility

dbt-core Version Status
1.8.x ✅ Tested
1.9.x ✅ Tested
1.10.x ✅ Tested

Architecture

dbt-colibri leverages:

  • SQLGlot for SQL parsing and column lineage extraction
  • dbt artifacts (manifest.json, catalog.json) for metadata
  • Static HTML/JS for zero-dependency dashboard deployment

Contributing

We welcome contributions! Raise an issue or request a feature, if you are open to contribute you can let us now in the issue.

Development Setup

# Clone the repository
git clone https://github.com/your-org/dbt-colibri.git
cd dbt-colibri

# Install development dependencies
uv sync --dev

# Run tests
pytest

# Format code
ruff format

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project builds upon excellent open source work:


About

A lightweight Python-based tool for extracting and analyzing data column lineage for dbt projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 95.5%
  • Python 4.5%