Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,39 +1,32 @@
name: CI

on:
push:
branches:
- main
pull_request:

jobs:
run-tests:
name: Linting and Unit Tests
runs-on: ubuntu-latest

env:
UV_SYSTEM_PYTHON: "1"

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
cache-dependency-glob: "uv.lock"

- name: "Set up Python"
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version-file: "pyproject.toml"

- name: Install requirements
run: uv sync --all-extras

- name: Lint code
run: uv run pre-commit run --all-files

- name: Type check
run: uv run mypy src
- name: Run tests
run: uv run pytest tests --ignore=tests/e2e
120 changes: 34 additions & 86 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,105 +6,53 @@ Open-source community offers a wide range of RAG-related frameworks focus on the

It comes with built-in monitoring and observability tools for better troubleshooting, integrated LLM-based metrics for evaluation, and human feedback collection capabilities. Whether you're building a lightweight knowledge base or an enterprise-grade application, this blueprint offers the flexibility and scalability needed for production deployments.

<div align="center">
<img src="res/readme/Architecture.png" width="1200">
<p><em>Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow</em></p>
</div>
![Architecture](res/readme/Architecture.png)
*Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow*

## 🚀 Features

- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources(Confluence, Notion, PDF)
- **Wide Models Support**: Availability of numerous embedding and language models
- **Vector Search**: Efficient similarity search using vector stores
- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/)
- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/)
- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/)
- **Setup flexibility**: Easy and flexible setup process of the pipeline
- **Hybrid Retrieval**: Improved retrieval accuracy by combining semantic vector search and keyword-based (BM25) search with Query Fusion.
- **ColBERT Reranking**: Advanced post-processing using ColBERT reranker for superior precision in top results.
- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources (Confluence, Notion, PDF).
- **Wide Models Support**: Availability of numerous embedding and language models.
- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/).
- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/).
- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/).
- **Setup flexibility**: Easy and flexible setup process of the pipeline.

## 🛠️ Tech Stack

### Core
[Python](https://www.python.org/) • [LlamaIndex](https://www.llamaindex.ai/) • [Chainlit](https://chainlit.io/) • [Langfuse](https://langfuse.com/) • [RAGAS](https://docs.ragas.io/)

---
### Components
- **Retriever**: Basic Vector, Hybrid (Vector + BM25)
- **Postprocessor**: ColBERT Reranker, Metadata filters
- **LLMs**: OpenAI, Anthropic, HuggingFace, Local (via Ollama)

### Data Sources
[Notion](https://developers.notion.com/) • [Confluence](https://developer.atlassian.com/cloud/confluence/rest/v2/intro/#about) • PDF files • [BundestagMine](https://bundestag-mine.de/api/documentation/index.html)
## 📖 Usage

---

### Embedding Models
[VoyageAI](https://www.voyageai.com/) • [OpenAI](https://openai.com/) • [Hugging Face](https://huggingface.co/)

---

### Language Models
[LiteLLM](https://docs.litellm.ai/) - Availability of many LLMs via providers like **OpenAI**, **Google** or **Anthropic** as well as local LLMs

---

### Vector Stores
[Qdrant](https://qdrant.tech/) • [Chroma](https://www.trychroma.com/) • [PGVector](https://github.com/pgvector)


---

### Infrastructure
[PostgreSQL](https://www.postgresql.org/) • [Docker](https://www.docker.com/)


## 🚀 Quickstart

Check the detailed [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)

## 🏗️ Architecture

### Data Flow

1. **Extraction**:
- Fetches content from respective data sources
- Preprocess retrieved resources and parse it to markdown

2. **Embedding**:
- Applies markdown aware splitting
- Embeds final nodes using the selected embedding model
- Saves the embeddings in the selected vector store

3. **Augmentation**
- Defines retrieval and augmentation pipeline encapusalted in a chat engine
- Integrates Chainlit for UI interface
- Integrates Langfuse for observability of generated responses and user queries

3. **Evaluation**:
- Uses Chainlit and Langfuse platforms for gathering human feedback
- Employs Ragas package for evaluating perfomance of current setup

For more info refer to specific readmes of [Extraction](/src/extraction/README.md), [Embedding](/src/embedding/README.md), [Augmentation](/src/augmentation/README.md) and [Evaluation](/src/evaluation//README.md).

### Integrations

For user interface the codebase uses [Chainlit](https://chainlit.io/), which is integrated with [Langfuse](https://langfuse.com/) responsible for observability and tracing of the system. Moreover, integration enables building evaluation datasets based on the user feeback regarding the system answers. Feedback is saved in Langfuse datasets and later used by [Evaluation](/src/evaluation//README.md) module.

## 📁 Project Structure
### Prerequisites
Install dependencies:
```bash
pip install .[all]
```

### Basic Start
To start the Chainlit UI with the default configuration:
```bash
python -m src.augmentation.app --env default
```
.
├── build/ # Build and deployment scripts
│ └── workstation/ # Build scripts for workstation setup
├── configurations/ # Configuration and secrets files
├── data/ # Data for local testing
├── res/ # Assets
└── src/ # Source code
├── augmentation/ # Chainlit, Langfuse, and RAG processing components
├── core/ # Base package
├── extraction/ # Data sources extraction
├── embedding/ # Data embedding
└── evaluate/ # Evaluation system
├── tests/ # Unit tests

### Hybrid Retrieval Start
To use the new Hybrid Retrieval with ColBERT reranking:
```bash
python -m src.augmentation.app --env hybrid
```

## 📚 Documentation
## ⚙️ Configuration

The project uses a modular configuration system based on Pydantic. Configuration files are located in the `configurations/` directory and are selected using the `--env` flag.

For detailed documentation on setup, configuration, and development:
- [Documentation Site](https://feld-m.github.io/rag_blueprint/)
- [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)
- `configuration.default.json`: Standard vector search
- `configuration.hybrid.json`: Hybrid search (Vector + BM25) with ColBERT reranking
89 changes: 89 additions & 0 deletions configurations/configuration.hybrid.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
{
\"extraction\": {
\"datasources\": [
{
\"name\": \"pdf\",
\"export_limit\": 5,
\"base_path\": \"data/bavarian_beer\"
}
]
},
\"embedding\": {
\"vector_store\": {
\"name\": \"qdrant\",
\"collection_name\": \"embeddings\",
\"host\": \"qdrant\",
\"port\": 6333
},
\"embedding_model\": {
\"provider\": \"hugging_face\",
\"name\": \"BAAI/bge-small-en-v1.5\",
\"tokenizer_name\": \"BAAI/bge-small-en-v1.5\",
\"splitter\": {
\"chunk_overlap_in_tokens\": 50,
\"chunk_size_in_tokens\": 384
}
}
},
\"augmentation\": {
\"chat_engine\": {
\"name\": \"langfuse\",
\"guardrails\": {
\"llm\": {
\"provider\": \"lite_llm\",
\"name\": \"gpt-4o-mini\",
\"max_tokens\": 1024,
\"max_retries\": 3,
\"context_window\": 16384
}
},
\"retriever\": {
\"name\": \"hybrid\",
\"similarity_top_k\": 10
},
\"llm\": {
\"provider\": \"lite_llm\",
\"name\": \"gpt-4o-mini\",
\"max_tokens\": 1024,
\"max_retries\": 3,
\"context_window\": 16384
},
\"postprocessors\": [
{
\"name\": \"colbert_rerank\",
\"top_n\": 5,
\"model\": \"colbert-ir/colbertv2.0\",
\"tokenizer\": \"colbert-ir/colbertv2.0\",
\"keep_retrieval_score\": true
}
]
},
\"langfuse\": {
\"host\": \"langfuse\",
\"protocol\": \"http\",
\"port\": 3000,
\"database\": {
\"host\": \"langfuse-db\",
\"port\": 5432,
\"db\": \"langfuse\"
}
},
\"chainlit\": {feat(config): add hybrid retrieval configuration example
\"port\": 8000
}
},
\"evaluation\": {
\"judge_llm\": {
\"provider\": \"lite_llm\",
\"name\": \"gpt-4o-mini\",
\"max_tokens\": 1024,
\"max_retries\": 3,
\"context_window\": 16384
},
\"judge_embedding_model\": {
\"provider\": \"hugging_face\",
\"name\": \"BAAI/bge-small-en-v1.5\",
\"tokenizer_name\": \"BAAI/bge-small-en-v1.5\"
}
}
}
9 changes: 5 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ dependencies = [
"pytest-asyncio>=0.26.0",
"pytest-mock>=3.14.0",
"python-on-whales>=0.76.1",
"mypy>=1.11.0",
]

[project.optional-dependencies]
Expand Down Expand Up @@ -44,17 +45,17 @@ embedding = [
"llama-index-vector-stores-qdrant>=0.4.3",
"psycopg2-binary>=2.9.10",
"transformers>=4.49.0",
"torch>=2.0.0", # Note: Install via pip separately on macOS Intel
"torch>=2.0.0", # Note: Install via pip separately on macOS Intel
]
augmentation = [
"chainlit>=2.3.0",
"langfuse>=2.60.2",
"llama-index-callbacks-langfuse>=0.3.0",
"llama-index-llms-litellm>=0.4.2",
"llama-index-postprocessor-colbert-rerank>=0.3.0",
"llama-index-retrievers-bm25>=0.3.0",
]
evaluation = ["ragas==0.1.14"]

all = [
"rag-blueprint[core]",
"rag-blueprint[extraction]",
Expand All @@ -73,8 +74,8 @@ packages = ["src/"]
[tool.uv]
# Override resolution to use system-installed torch on macOS Intel
override-dependencies = [
"torch==2.2.2", # Use pip-installed torch for macOS Intel compatibility
"torch==2.2.2", # Use pip-installed torch for macOS Intel compatibility
]
constraint-dependencies = [
"numpy<2", # torch 2.2.2 requires numpy 1.x
"numpy<2", # torch 2.2.2 requires numpy 1.x
]
47 changes: 47 additions & 0 deletions src/augmentation/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import chainlit as cl
from augmentation.bootstrap.initializer import AugmentationInitializer
from augmentation.components.chat_engines.langfuse.chat_engine import (
LangfuseChatEngineFactory,
SourceProcess,
)
import logging

# Initialize the system once at startup
initializer = AugmentationInitializer()
configuration = initializer.get_configuration()

@cl.on_chat_start
async def start():
# Use the factory to create the chat engine
chat_engine = LangfuseChatEngineFactory.create(configuration)

# Store the chat engine in the user session
cl.user_session.set("chat_engine", chat_engine)

await cl.Message(content="Hello! I'm your RAG-powered assistant. How can I help you today?").send()

@cl.on_message
async def main(message: cl.Message):
chat_engine = cl.user_session.get("chat_engine")

# Process message with streaming support
msg = cl.Message(content="")

# Langfuse session ID can be linked here
chat_engine.set_session_id(cl.user_session.get("id"))

# Use the stream_chat method which includes Langfuse tracing
response = chat_engine.stream_chat(
message.content,
chainlit_message_id=message.id,
source_process=SourceProcess.CHAT_COMPLETION
)

# Check if response is streaming or static
if hasattr(response, "response_gen") and response.response_gen:
for token in response.response_gen:
await msg.stream_token(token)
else:
msg.content = response.response

await msg.send()
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class RetrieverName(str, Enum):
BASIC = "basic"
AUTO = "auto"
DYNAMIC_TEMPORAL = "dynamic_temporal"
HYBRID = "hybrid"


class RetrieverConfiguration(BaseConfiguration):
Expand Down
13 changes: 13 additions & 0 deletions src/augmentation/components/retrievers/hybrid/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from augmentation.bootstrap.configuration.components.retriever_configuration import (
RetrieverConfigurationRegistry,
RetrieverName,
)
from augmentation.components.retrievers.hybrid.retriever import (
HybridRetrieverFactory,
)
from augmentation.components.retrievers.registry import RetrieverRegistry


def register() -> None:
\"\"\"Register Hybrid Retriever components with the system.\"\"\"
RetrieverRegistry.register(RetrieverName.HYBRID, HybridRetrieverFactory)
16 changes: 16 additions & 0 deletions src/augmentation/components/retrievers/hybrid/configuration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from typing import Literal
from pydantic import Field
from augmentation.bootstrap.configuration.components.retriever_configuration import (
RetrieverConfiguration,
RetrieverName,
)

class HybridRetrieverConfiguration(RetrieverConfiguration):
\"\"\"
Configuration for the Hybrid Retriever component.
This class defines the configuration parameters needed for initializing
and operating the hybrid retriever, extending the base RetrieverConfiguration.
\"\"\"
name: Literal[RetrieverName.HYBRID] = Field(
..., description=\"The name of the retriever.\"
)
Loading