feld-m · swamy18 · Dec 19, 2025 · Dec 19, 2025 · Dec 19, 2025 · Dec 19, 2025
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -1,39 +1,32 @@
 name: CI
-
 on:
   push:
     branches:
       - main
   pull_request:
-
 jobs:
   run-tests:
     name: Linting and Unit Tests
     runs-on: ubuntu-latest
-
     env:
       UV_SYSTEM_PYTHON: "1"
-
     steps:
       - uses: actions/checkout@v4
-
       - name: Install uv
         uses: astral-sh/setup-uv@v5
         with:
           enable-cache: true
           cache-dependency-glob: "uv.lock"
-
       - name: "Set up Python"
         uses: actions/setup-python@v5
         with:
           python-version: "3.12"
           python-version-file: "pyproject.toml"
-
       - name: Install requirements
         run: uv sync --all-extras
-
       - name: Lint code
         run: uv run pre-commit run --all-files
-
+      - name: Type check
+        run: uv run mypy src
       - name: Run tests
         run: uv run pytest tests --ignore=tests/e2e
diff --git a/README.md b/README.md
@@ -6,105 +6,53 @@ Open-source community offers a wide range of RAG-related frameworks focus on the
 
 It comes with built-in monitoring and observability tools for better troubleshooting, integrated LLM-based metrics for evaluation, and human feedback collection capabilities. Whether you're building a lightweight knowledge base or an enterprise-grade application, this blueprint offers the flexibility and scalability needed for production deployments.
 
-<div align="center">
-  <img src="res/readme/Architecture.png" width="1200">
-  <p><em>Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow</em></p>
-</div>
+![Architecture](res/readme/Architecture.png)
+*Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow*
 
 ## 🚀 Features
 
-- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources(Confluence, Notion, PDF)
-- **Wide Models Support**: Availability of numerous embedding and language models
-- **Vector Search**: Efficient similarity search using vector stores
-- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/)
-- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/)
-- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/)
-- **Setup flexibility**: Easy and flexible setup process of the pipeline
+- **Hybrid Retrieval**: Improved retrieval accuracy by combining semantic vector search and keyword-based (BM25) search with Query Fusion.
+- **ColBERT Reranking**: Advanced post-processing using ColBERT reranker for superior precision in top results.
+- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources (Confluence, Notion, PDF).
+- **Wide Models Support**: Availability of numerous embedding and language models.
+- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/).
+- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/).
+- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/).
+- **Setup flexibility**: Easy and flexible setup process of the pipeline.
 
 ## 🛠️ Tech Stack
 
 ### Core
 [Python](https://www.python.org/) • [LlamaIndex](https://www.llamaindex.ai/) • [Chainlit](https://chainlit.io/) • [Langfuse](https://langfuse.com/) • [RAGAS](https://docs.ragas.io/)
 
----
+### Components
+- **Retriever**: Basic Vector, Hybrid (Vector + BM25)
+- **Postprocessor**: ColBERT Reranker, Metadata filters
+- **LLMs**: OpenAI, Anthropic, HuggingFace, Local (via Ollama)
 
-### Data Sources
-[Notion](https://developers.notion.com/) • [Confluence](https://developer.atlassian.com/cloud/confluence/rest/v2/intro/#about) • PDF files • [BundestagMine](https://bundestag-mine.de/api/documentation/index.html)
+## 📖 Usage
 
----
-
-### Embedding Models
-[VoyageAI](https://www.voyageai.com/) • [OpenAI](https://openai.com/) • [Hugging Face](https://huggingface.co/)
-
----
-
-### Language Models
-[LiteLLM](https://docs.litellm.ai/) - Availability of many LLMs via providers like **OpenAI**, **Google** or **Anthropic** as well as local LLMs
-
----
-
-### Vector Stores
-[Qdrant](https://qdrant.tech/) • [Chroma](https://www.trychroma.com/) • [PGVector](https://github.com/pgvector)
-
-
----
-
-### Infrastructure
-[PostgreSQL](https://www.postgresql.org/) • [Docker](https://www.docker.com/)
-
-
-## 🚀 Quickstart
-
-Check the detailed [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)
-
-## 🏗️ Architecture
-
-### Data Flow
-
-1. **Extraction**:
-   - Fetches content from respective data sources
-   - Preprocess retrieved resources and parse it to markdown
-
-2. **Embedding**:
-   - Applies markdown aware splitting
-   - Embeds final nodes using the selected embedding model
-   - Saves the embeddings in the selected vector store
-
-3. **Augmentation**
-   - Defines retrieval and augmentation pipeline encapusalted in a chat engine
-   - Integrates Chainlit for UI interface
-   - Integrates Langfuse for observability of generated responses and user queries
-
-3. **Evaluation**:
-   - Uses Chainlit and Langfuse platforms for gathering human feedback
-   - Employs Ragas package for evaluating perfomance of current setup
-
-For more info refer to specific readmes of [Extraction](/src/extraction/README.md), [Embedding](/src/embedding/README.md), [Augmentation](/src/augmentation/README.md) and [Evaluation](/src/evaluation//README.md).
-
-### Integrations
-
-For user interface the codebase uses [Chainlit](https://chainlit.io/), which is integrated with [Langfuse](https://langfuse.com/) responsible for observability and tracing of the system. Moreover, integration enables building evaluation datasets based on the user feeback regarding the system answers. Feedback is saved in Langfuse datasets and later used by [Evaluation](/src/evaluation//README.md) module.
-
-## 📁 Project Structure
+### Prerequisites
+Install dependencies:
+```bash
+pip install .[all]
+```
 
+### Basic Start
+To start the Chainlit UI with the default configuration:
+```bash
+python -m src.augmentation.app --env default
 ```
-.
-├── build/            # Build and deployment scripts
-│   └── workstation/  # Build scripts for workstation setup
-├── configurations/   # Configuration and secrets files
-├── data/             # Data for local testing
-├── res/              # Assets
-└── src/              # Source code
-    ├── augmentation/   # Chainlit, Langfuse, and RAG processing components
-    ├── core/           # Base package
-    ├── extraction/     # Data sources extraction
-    ├── embedding/      # Data embedding
-    └── evaluate/       # Evaluation system
-├── tests/            # Unit tests
+
+### Hybrid Retrieval Start
+To use the new Hybrid Retrieval with ColBERT reranking:
+```bash
+python -m src.augmentation.app --env hybrid
 ```
 
-## 📚 Documentation
+## ⚙️ Configuration
+
+The project uses a modular configuration system based on Pydantic. Configuration files are located in the `configurations/` directory and are selected using the `--env` flag.
 
-For detailed documentation on setup, configuration, and development:
-- [Documentation Site](https://feld-m.github.io/rag_blueprint/)
-- [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)
+- `configuration.default.json`: Standard vector search
+- `configuration.hybrid.json`: Hybrid search (Vector + BM25) with ColBERT reranking
diff --git a/configurations/configuration.hybrid.json b/configurations/configuration.hybrid.json
@@ -0,0 +1,89 @@
+{
+  \"extraction\": {
+    \"datasources\": [
+      {
+        \"name\": \"pdf\",
+        \"export_limit\": 5,
+        \"base_path\": \"data/bavarian_beer\"
+      }
+    ]
+  },
+  \"embedding\": {
+    \"vector_store\": {
+      \"name\": \"qdrant\",
+      \"collection_name\": \"embeddings\",
+      \"host\": \"qdrant\",
+      \"port\": 6333
+    },
+    \"embedding_model\": {
+      \"provider\": \"hugging_face\",
+      \"name\": \"BAAI/bge-small-en-v1.5\",
+      \"tokenizer_name\": \"BAAI/bge-small-en-v1.5\",
+      \"splitter\": {
+        \"chunk_overlap_in_tokens\": 50,
+        \"chunk_size_in_tokens\": 384
+      }
+    }
+  },
+  \"augmentation\": {
+    \"chat_engine\": {
+      \"name\": \"langfuse\",
+      \"guardrails\": {
+        \"llm\": {
+          \"provider\": \"lite_llm\",
+          \"name\": \"gpt-4o-mini\",
+          \"max_tokens\": 1024,
+          \"max_retries\": 3,
+          \"context_window\": 16384
+        }
+      },
+      \"retriever\": {
+        \"name\": \"hybrid\",
+        \"similarity_top_k\": 10
+      },
+      \"llm\": {
+        \"provider\": \"lite_llm\",
+        \"name\": \"gpt-4o-mini\",
+        \"max_tokens\": 1024,
+        \"max_retries\": 3,
+        \"context_window\": 16384
+      },
+      \"postprocessors\": [
+        {
+          \"name\": \"colbert_rerank\",
+          \"top_n\": 5,
+          \"model\": \"colbert-ir/colbertv2.0\",
+          \"tokenizer\": \"colbert-ir/colbertv2.0\",
+          \"keep_retrieval_score\": true
+        }
+      ]
+    },
+    \"langfuse\": {
+      \"host\": \"langfuse\",
+      \"protocol\": \"http\",
+      \"port\": 3000,
+      \"database\": {
+        \"host\": \"langfuse-db\",
+        \"port\": 5432,
+        \"db\": \"langfuse\"
+      }
+    },
+    \"chainlit\": {feat(config): add hybrid retrieval configuration example
+      \"port\": 8000
+    }
+  },
+  \"evaluation\": {
+    \"judge_llm\": {
+      \"provider\": \"lite_llm\",
+      \"name\": \"gpt-4o-mini\",
+      \"max_tokens\": 1024,
+      \"max_retries\": 3,
+      \"context_window\": 16384
+    },
+    \"judge_embedding_model\": {
+      \"provider\": \"hugging_face\",
+      \"name\": \"BAAI/bge-small-en-v1.5\",
+      \"tokenizer_name\": \"BAAI/bge-small-en-v1.5\"
+    }
+  }
+}
diff --git a/pyproject.toml b/pyproject.toml
@@ -11,6 +11,7 @@ dependencies = [
     "pytest-asyncio>=0.26.0",
     "pytest-mock>=3.14.0",
     "python-on-whales>=0.76.1",
+    "mypy>=1.11.0",
 ]
 
 [project.optional-dependencies]
@@ -44,17 +45,17 @@ embedding = [
     "llama-index-vector-stores-qdrant>=0.4.3",
     "psycopg2-binary>=2.9.10",
     "transformers>=4.49.0",
-    "torch>=2.0.0",  # Note: Install via pip separately on macOS Intel
+    "torch>=2.0.0", # Note: Install via pip separately on macOS Intel
 ]
 augmentation = [
     "chainlit>=2.3.0",
     "langfuse>=2.60.2",
     "llama-index-callbacks-langfuse>=0.3.0",
     "llama-index-llms-litellm>=0.4.2",
     "llama-index-postprocessor-colbert-rerank>=0.3.0",
+    "llama-index-retrievers-bm25>=0.3.0",
 ]
 evaluation = ["ragas==0.1.14"]
-
 all = [
     "rag-blueprint[core]",
     "rag-blueprint[extraction]",
@@ -73,8 +74,8 @@ packages = ["src/"]
 [tool.uv]
 # Override resolution to use system-installed torch on macOS Intel
 override-dependencies = [
-    "torch==2.2.2",  # Use pip-installed torch for macOS Intel compatibility
+    "torch==2.2.2", # Use pip-installed torch for macOS Intel compatibility
 ]
 constraint-dependencies = [
-    "numpy<2",  # torch 2.2.2 requires numpy 1.x
+    "numpy<2", # torch 2.2.2 requires numpy 1.x
 ]
diff --git a/src/augmentation/app.py b/src/augmentation/app.py
@@ -0,0 +1,47 @@
+import chainlit as cl
+from augmentation.bootstrap.initializer import AugmentationInitializer
+from augmentation.components.chat_engines.langfuse.chat_engine import (
+    LangfuseChatEngineFactory,
+    SourceProcess,
+)
+import logging
+
+# Initialize the system once at startup
+initializer = AugmentationInitializer()
+configuration = initializer.get_configuration()
+
+@cl.on_chat_start
+async def start():
+    # Use the factory to create the chat engine
+    chat_engine = LangfuseChatEngineFactory.create(configuration)
+
+    # Store the chat engine in the user session
+    cl.user_session.set("chat_engine", chat_engine)
+
+    await cl.Message(content="Hello! I'm your RAG-powered assistant. How can I help you today?").send()
+
+@cl.on_message
+async def main(message: cl.Message):
+    chat_engine = cl.user_session.get("chat_engine")
+
+    # Process message with streaming support
+    msg = cl.Message(content="")
+
+    # Langfuse session ID can be linked here
+    chat_engine.set_session_id(cl.user_session.get("id"))
+
+    # Use the stream_chat method which includes Langfuse tracing
+    response = chat_engine.stream_chat(
+        message.content,
+        chainlit_message_id=message.id,
+        source_process=SourceProcess.CHAT_COMPLETION
+    )
+
+    # Check if response is streaming or static
+    if hasattr(response, "response_gen") and response.response_gen:
+        for token in response.response_gen:
+            await msg.stream_token(token)
+    else:
+        msg.content = response.response
+
+    await msg.send()
diff --git a/src/augmentation/bootstrap/configuration/components/retriever_configuration.py b/src/augmentation/bootstrap/configuration/components/retriever_configuration.py
@@ -13,6 +13,7 @@ class RetrieverName(str, Enum):
     BASIC = "basic"
     AUTO = "auto"
     DYNAMIC_TEMPORAL = "dynamic_temporal"
+        HYBRID = "hybrid"
 
 
 class RetrieverConfiguration(BaseConfiguration):

diff --git a/src/augmentation/components/retrievers/hybrid/__init__.py b/src/augmentation/components/retrievers/hybrid/__init__.py
@@ -0,0 +1,13 @@
+from augmentation.bootstrap.configuration.components.retriever_configuration import (
+    RetrieverConfigurationRegistry,
+    RetrieverName,
+)
+from augmentation.components.retrievers.hybrid.retriever import (
+    HybridRetrieverFactory,
+)
+from augmentation.components.retrievers.registry import RetrieverRegistry
+
+
+def register() -> None:
+    \"\"\"Register Hybrid Retriever components with the system.\"\"\"
+    RetrieverRegistry.register(RetrieverName.HYBRID, HybridRetrieverFactory)
diff --git a/src/augmentation/components/retrievers/hybrid/configuration.py b/src/augmentation/components/retrievers/hybrid/configuration.py
@@ -0,0 +1,16 @@
+from typing import Literal
+from pydantic import Field
+from augmentation.bootstrap.configuration.components.retriever_configuration import (
+    RetrieverConfiguration,
+    RetrieverName,
+)
+
+class HybridRetrieverConfiguration(RetrieverConfiguration):
+    \"\"\"
+    Configuration for the Hybrid Retriever component.
+    This class defines the configuration parameters needed for initializing
+    and operating the hybrid retriever, extending the base RetrieverConfiguration.
+    \"\"\"
+    name: Literal[RetrieverName.HYBRID] = Field(
+        ..., description=\"The name of the retriever.\"
+    )