Skip to content

The Future of Data Engineering — A CLI SQL client for the modern data stack, enabling AI-native context engineering for data.

License

Notifications You must be signed in to change notification settings

Datus-ai/Datus-agent

Repository files navigation

Apache 2.0 License Website Document Quick Start Release Note Join our Slack

🎯 Overview

Datus is an open-source data engineering agent that builds evolvable context for your data system.

Data engineering needs a shift from "building tables and pipelines" to "delivering scoped, domain-aware agents for analysts and business users.

DatusArchitecure

  • Datus-CLI: An AI-powered command-line interface for data engineers—think "Claude Code for data engineers." Write SQL, build subagents, and construct context interactively.
  • Datus-Chat: A web chatbot providing multi-turn conversations with built-in feedback mechanisms (upvotes, issue reports, success stories) for data analysts.
  • Datus-API: APIs for other agents or applications that need stable, accurate data services.

🚀 Key Features

🧩 Contextual Data Engineering

Automatically builds a living semantic map of your company’s data — combining metadata, metrics, reference SQL, and external knowledge — so engineers and analysts collaborate through context instead of raw SQL.

💬 Agentic Chat

A Claude-Code-like CLI for data engineers. Chat with your data, recall tables or metrics instantly, and run agentic actions — all in one terminal.

🧠 Subagents for Every Domain

Turn data domains into domain-aware chatbots. Each subagent encapsulates the right context, tools, and rules — making data access accurate, reusable, and safe.

🔁 Continuous Learning Loop

Every query and feedback improves the model. Datus learns from success stories and user corrections to evolve reasoning accuracy over time.


🧰 Installation

Requirements: Python >= 3.12

pip install datus-agent==0.2.1

datus-agent init

For detailed installation instructions, see the Quickstart Guide.

🧭 User Journey

1️⃣ Initial Exploration

A Data Engineer (DE) starts by chatting with the database using /chat. They run simple questions, test joins, and refine prompts using @table or @file. Each round of feedback (e.g., "Join table1 and table2 by PK") helps the model improve accuracy. datus-cli --namespace demo /Check the top 10 bank by assets lost @Table duckdb-demo.main.bank_failures

Learn more: CLI Introduction

2️⃣ Building Context

The DE imports SQL history and generates summaries or semantic models:

/gen_semantic_model xxx @subject They edit or refine models in @subject, combining AI-generated drafts with human corrections. Now, /chat can reason using both SQL history and semantic context.

Learn more: Knowledge Base Introduction

3️⃣ Creating a Subagent

When the context matures, the DE defines a domain-specific chatbot (Subagent):

.subagent add mychatbot

They describe its purpose, add rules, choose tools, and limit scope (e.g., 5 tables). Each subagent becomes a reusable, scoped assistant for a specific business area.

Learn more: Subagent Introduction

4️⃣ Delivering to Analysts

The Subagent is deployed to a web interface: http://localhost:8501/?subagent=mychatbot

Analysts chat directly, upvote correct answers, or report issues for feedback. Results can be saved via !export.

Learn more: Web Chatbot Introduction

5️⃣ Refinement & Iteration

Feedback from analysts loops back to improve the subagent: engineers fix SQL, add rules, and update context. Over time, the chatbot becomes more accurate, self-evolving, and domain-aware.

For detailed guidance, please follow our tutorial.

About

The Future of Data Engineering — A CLI SQL client for the modern data stack, enabling AI-native context engineering for data.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages