diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 9aab600..5f0eaad 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1,39 +1,32 @@
name: CI
-
on:
push:
branches:
- main
pull_request:
-
jobs:
run-tests:
name: Linting and Unit Tests
runs-on: ubuntu-latest
-
env:
UV_SYSTEM_PYTHON: "1"
-
steps:
- uses: actions/checkout@v4
-
- name: Install uv
uses: astral-sh/setup-uv@v5
with:
enable-cache: true
cache-dependency-glob: "uv.lock"
-
- name: "Set up Python"
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version-file: "pyproject.toml"
-
- name: Install requirements
run: uv sync --all-extras
-
- name: Lint code
run: uv run pre-commit run --all-files
-
+ - name: Type check
+ run: uv run mypy src
- name: Run tests
run: uv run pytest tests --ignore=tests/e2e
diff --git a/README.md b/README.md
index 94ba23b..1ce2401 100644
--- a/README.md
+++ b/README.md
@@ -6,105 +6,53 @@ Open-source community offers a wide range of RAG-related frameworks focus on the
It comes with built-in monitoring and observability tools for better troubleshooting, integrated LLM-based metrics for evaluation, and human feedback collection capabilities. Whether you're building a lightweight knowledge base or an enterprise-grade application, this blueprint offers the flexibility and scalability needed for production deployments.
-
-

-
Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow
-
+
+*Figure 1: High-level architecture of the RAG Blueprint framework showing the main components and data flow*
## 🚀 Features
-- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources(Confluence, Notion, PDF)
-- **Wide Models Support**: Availability of numerous embedding and language models
-- **Vector Search**: Efficient similarity search using vector stores
-- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/)
-- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/)
-- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/)
-- **Setup flexibility**: Easy and flexible setup process of the pipeline
+- **Hybrid Retrieval**: Improved retrieval accuracy by combining semantic vector search and keyword-based (BM25) search with Query Fusion.
+- **ColBERT Reranking**: Advanced post-processing using ColBERT reranker for superior precision in top results.
+- **Multiple Knowledge Base Integration**: Seamless extraction from several Data Sources (Confluence, Notion, PDF).
+- **Wide Models Support**: Availability of numerous embedding and language models.
+- **Interactive Chat**: User-friendly interface for querying knowledge on [Chainlit](https://chainlit.io/).
+- **Performance Monitoring**: Query and response tracking with [Langfuse](https://langfuse.com/).
+- **Evaluation**: Comprehensive evaluation metrics using [RAGAS](https://docs.ragas.io/en/stable/).
+- **Setup flexibility**: Easy and flexible setup process of the pipeline.
## 🛠️ Tech Stack
### Core
[Python](https://www.python.org/) • [LlamaIndex](https://www.llamaindex.ai/) • [Chainlit](https://chainlit.io/) • [Langfuse](https://langfuse.com/) • [RAGAS](https://docs.ragas.io/)
----
+### Components
+- **Retriever**: Basic Vector, Hybrid (Vector + BM25)
+- **Postprocessor**: ColBERT Reranker, Metadata filters
+- **LLMs**: OpenAI, Anthropic, HuggingFace, Local (via Ollama)
-### Data Sources
-[Notion](https://developers.notion.com/) • [Confluence](https://developer.atlassian.com/cloud/confluence/rest/v2/intro/#about) • PDF files • [BundestagMine](https://bundestag-mine.de/api/documentation/index.html)
+## 📖 Usage
----
-
-### Embedding Models
-[VoyageAI](https://www.voyageai.com/) • [OpenAI](https://openai.com/) • [Hugging Face](https://huggingface.co/)
-
----
-
-### Language Models
-[LiteLLM](https://docs.litellm.ai/) - Availability of many LLMs via providers like **OpenAI**, **Google** or **Anthropic** as well as local LLMs
-
----
-
-### Vector Stores
-[Qdrant](https://qdrant.tech/) • [Chroma](https://www.trychroma.com/) • [PGVector](https://github.com/pgvector)
-
-
----
-
-### Infrastructure
-[PostgreSQL](https://www.postgresql.org/) • [Docker](https://www.docker.com/)
-
-
-## 🚀 Quickstart
-
-Check the detailed [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)
-
-## 🏗️ Architecture
-
-### Data Flow
-
-1. **Extraction**:
- - Fetches content from respective data sources
- - Preprocess retrieved resources and parse it to markdown
-
-2. **Embedding**:
- - Applies markdown aware splitting
- - Embeds final nodes using the selected embedding model
- - Saves the embeddings in the selected vector store
-
-3. **Augmentation**
- - Defines retrieval and augmentation pipeline encapusalted in a chat engine
- - Integrates Chainlit for UI interface
- - Integrates Langfuse for observability of generated responses and user queries
-
-3. **Evaluation**:
- - Uses Chainlit and Langfuse platforms for gathering human feedback
- - Employs Ragas package for evaluating perfomance of current setup
-
-For more info refer to specific readmes of [Extraction](/src/extraction/README.md), [Embedding](/src/embedding/README.md), [Augmentation](/src/augmentation/README.md) and [Evaluation](/src/evaluation//README.md).
-
-### Integrations
-
-For user interface the codebase uses [Chainlit](https://chainlit.io/), which is integrated with [Langfuse](https://langfuse.com/) responsible for observability and tracing of the system. Moreover, integration enables building evaluation datasets based on the user feeback regarding the system answers. Feedback is saved in Langfuse datasets and later used by [Evaluation](/src/evaluation//README.md) module.
-
-## 📁 Project Structure
+### Prerequisites
+Install dependencies:
+```bash
+pip install .[all]
+```
+### Basic Start
+To start the Chainlit UI with the default configuration:
+```bash
+python -m src.augmentation.app --env default
```
-.
-├── build/ # Build and deployment scripts
-│ └── workstation/ # Build scripts for workstation setup
-├── configurations/ # Configuration and secrets files
-├── data/ # Data for local testing
-├── res/ # Assets
-└── src/ # Source code
- ├── augmentation/ # Chainlit, Langfuse, and RAG processing components
- ├── core/ # Base package
- ├── extraction/ # Data sources extraction
- ├── embedding/ # Data embedding
- └── evaluate/ # Evaluation system
-├── tests/ # Unit tests
+
+### Hybrid Retrieval Start
+To use the new Hybrid Retrieval with ColBERT reranking:
+```bash
+python -m src.augmentation.app --env hybrid
```
-## 📚 Documentation
+## ⚙️ Configuration
+
+The project uses a modular configuration system based on Pydantic. Configuration files are located in the `configurations/` directory and are selected using the `--env` flag.
-For detailed documentation on setup, configuration, and development:
-- [Documentation Site](https://feld-m.github.io/rag_blueprint/)
-- [Quickstart Setup](https://feld-m.github.io/rag_blueprint/quickstart/quickstart_setup/)
+- `configuration.default.json`: Standard vector search
+- `configuration.hybrid.json`: Hybrid search (Vector + BM25) with ColBERT reranking
diff --git a/configurations/configuration.hybrid.json b/configurations/configuration.hybrid.json
new file mode 100644
index 0000000..8a1b29b
--- /dev/null
+++ b/configurations/configuration.hybrid.json
@@ -0,0 +1,89 @@
+{
+ \"extraction\": {
+ \"datasources\": [
+ {
+ \"name\": \"pdf\",
+ \"export_limit\": 5,
+ \"base_path\": \"data/bavarian_beer\"
+ }
+ ]
+ },
+ \"embedding\": {
+ \"vector_store\": {
+ \"name\": \"qdrant\",
+ \"collection_name\": \"embeddings\",
+ \"host\": \"qdrant\",
+ \"port\": 6333
+ },
+ \"embedding_model\": {
+ \"provider\": \"hugging_face\",
+ \"name\": \"BAAI/bge-small-en-v1.5\",
+ \"tokenizer_name\": \"BAAI/bge-small-en-v1.5\",
+ \"splitter\": {
+ \"chunk_overlap_in_tokens\": 50,
+ \"chunk_size_in_tokens\": 384
+ }
+ }
+ },
+ \"augmentation\": {
+ \"chat_engine\": {
+ \"name\": \"langfuse\",
+ \"guardrails\": {
+ \"llm\": {
+ \"provider\": \"lite_llm\",
+ \"name\": \"gpt-4o-mini\",
+ \"max_tokens\": 1024,
+ \"max_retries\": 3,
+ \"context_window\": 16384
+ }
+ },
+ \"retriever\": {
+ \"name\": \"hybrid\",
+ \"similarity_top_k\": 10
+ },
+ \"llm\": {
+ \"provider\": \"lite_llm\",
+ \"name\": \"gpt-4o-mini\",
+ \"max_tokens\": 1024,
+ \"max_retries\": 3,
+ \"context_window\": 16384
+ },
+ \"postprocessors\": [
+ {
+ \"name\": \"colbert_rerank\",
+ \"top_n\": 5,
+ \"model\": \"colbert-ir/colbertv2.0\",
+ \"tokenizer\": \"colbert-ir/colbertv2.0\",
+ \"keep_retrieval_score\": true
+ }
+ ]
+ },
+ \"langfuse\": {
+ \"host\": \"langfuse\",
+ \"protocol\": \"http\",
+ \"port\": 3000,
+ \"database\": {
+ \"host\": \"langfuse-db\",
+ \"port\": 5432,
+ \"db\": \"langfuse\"
+ }
+ },
+ \"chainlit\": {feat(config): add hybrid retrieval configuration example
+ \"port\": 8000
+ }
+ },
+ \"evaluation\": {
+ \"judge_llm\": {
+ \"provider\": \"lite_llm\",
+ \"name\": \"gpt-4o-mini\",
+ \"max_tokens\": 1024,
+ \"max_retries\": 3,
+ \"context_window\": 16384
+ },
+ \"judge_embedding_model\": {
+ \"provider\": \"hugging_face\",
+ \"name\": \"BAAI/bge-small-en-v1.5\",
+ \"tokenizer_name\": \"BAAI/bge-small-en-v1.5\"
+ }
+ }
+}
diff --git a/pyproject.toml b/pyproject.toml
index eb8cf14..234cfa1 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,6 +11,7 @@ dependencies = [
"pytest-asyncio>=0.26.0",
"pytest-mock>=3.14.0",
"python-on-whales>=0.76.1",
+ "mypy>=1.11.0",
]
[project.optional-dependencies]
@@ -44,7 +45,7 @@ embedding = [
"llama-index-vector-stores-qdrant>=0.4.3",
"psycopg2-binary>=2.9.10",
"transformers>=4.49.0",
- "torch>=2.0.0", # Note: Install via pip separately on macOS Intel
+ "torch>=2.0.0", # Note: Install via pip separately on macOS Intel
]
augmentation = [
"chainlit>=2.3.0",
@@ -52,9 +53,9 @@ augmentation = [
"llama-index-callbacks-langfuse>=0.3.0",
"llama-index-llms-litellm>=0.4.2",
"llama-index-postprocessor-colbert-rerank>=0.3.0",
+ "llama-index-retrievers-bm25>=0.3.0",
]
evaluation = ["ragas==0.1.14"]
-
all = [
"rag-blueprint[core]",
"rag-blueprint[extraction]",
@@ -73,8 +74,8 @@ packages = ["src/"]
[tool.uv]
# Override resolution to use system-installed torch on macOS Intel
override-dependencies = [
- "torch==2.2.2", # Use pip-installed torch for macOS Intel compatibility
+ "torch==2.2.2", # Use pip-installed torch for macOS Intel compatibility
]
constraint-dependencies = [
- "numpy<2", # torch 2.2.2 requires numpy 1.x
+ "numpy<2", # torch 2.2.2 requires numpy 1.x
]
diff --git a/src/augmentation/app.py b/src/augmentation/app.py
new file mode 100644
index 0000000..73c0656
--- /dev/null
+++ b/src/augmentation/app.py
@@ -0,0 +1,47 @@
+import chainlit as cl
+from augmentation.bootstrap.initializer import AugmentationInitializer
+from augmentation.components.chat_engines.langfuse.chat_engine import (
+ LangfuseChatEngineFactory,
+ SourceProcess,
+)
+import logging
+
+# Initialize the system once at startup
+initializer = AugmentationInitializer()
+configuration = initializer.get_configuration()
+
+@cl.on_chat_start
+async def start():
+ # Use the factory to create the chat engine
+ chat_engine = LangfuseChatEngineFactory.create(configuration)
+
+ # Store the chat engine in the user session
+ cl.user_session.set("chat_engine", chat_engine)
+
+ await cl.Message(content="Hello! I'm your RAG-powered assistant. How can I help you today?").send()
+
+@cl.on_message
+async def main(message: cl.Message):
+ chat_engine = cl.user_session.get("chat_engine")
+
+ # Process message with streaming support
+ msg = cl.Message(content="")
+
+ # Langfuse session ID can be linked here
+ chat_engine.set_session_id(cl.user_session.get("id"))
+
+ # Use the stream_chat method which includes Langfuse tracing
+ response = chat_engine.stream_chat(
+ message.content,
+ chainlit_message_id=message.id,
+ source_process=SourceProcess.CHAT_COMPLETION
+ )
+
+ # Check if response is streaming or static
+ if hasattr(response, "response_gen") and response.response_gen:
+ for token in response.response_gen:
+ await msg.stream_token(token)
+ else:
+ msg.content = response.response
+
+ await msg.send()
diff --git a/src/augmentation/bootstrap/configuration/components/retriever_configuration.py b/src/augmentation/bootstrap/configuration/components/retriever_configuration.py
index c3e516a..db17afd 100644
--- a/src/augmentation/bootstrap/configuration/components/retriever_configuration.py
+++ b/src/augmentation/bootstrap/configuration/components/retriever_configuration.py
@@ -13,6 +13,7 @@ class RetrieverName(str, Enum):
BASIC = "basic"
AUTO = "auto"
DYNAMIC_TEMPORAL = "dynamic_temporal"
+ HYBRID = "hybrid"
class RetrieverConfiguration(BaseConfiguration):
diff --git a/src/augmentation/components/retrievers/hybrid/__init__.py b/src/augmentation/components/retrievers/hybrid/__init__.py
new file mode 100644
index 0000000..6763f51
--- /dev/null
+++ b/src/augmentation/components/retrievers/hybrid/__init__.py
@@ -0,0 +1,13 @@
+from augmentation.bootstrap.configuration.components.retriever_configuration import (
+ RetrieverConfigurationRegistry,
+ RetrieverName,
+)
+from augmentation.components.retrievers.hybrid.retriever import (
+ HybridRetrieverFactory,
+)
+from augmentation.components.retrievers.registry import RetrieverRegistry
+
+
+def register() -> None:
+ \"\"\"Register Hybrid Retriever components with the system.\"\"\"
+ RetrieverRegistry.register(RetrieverName.HYBRID, HybridRetrieverFactory)
diff --git a/src/augmentation/components/retrievers/hybrid/configuration.py b/src/augmentation/components/retrievers/hybrid/configuration.py
new file mode 100644
index 0000000..3375975
--- /dev/null
+++ b/src/augmentation/components/retrievers/hybrid/configuration.py
@@ -0,0 +1,16 @@
+from typing import Literal
+from pydantic import Field
+from augmentation.bootstrap.configuration.components.retriever_configuration import (
+ RetrieverConfiguration,
+ RetrieverName,
+)
+
+class HybridRetrieverConfiguration(RetrieverConfiguration):
+ \"\"\"
+ Configuration for the Hybrid Retriever component.
+ This class defines the configuration parameters needed for initializing
+ and operating the hybrid retriever, extending the base RetrieverConfiguration.
+ \"\"\"
+ name: Literal[RetrieverName.HYBRID] = Field(
+ ..., description=\"The name of the retriever.\"
+ )
diff --git a/src/augmentation/components/retrievers/hybrid/retriever.py b/src/augmentation/components/retrievers/hybrid/retriever.py
new file mode 100644
index 0000000..e4d857d
--- /dev/null
+++ b/src/augmentation/components/retrievers/hybrid/retriever.py
@@ -0,0 +1,68 @@
+from typing import Type
+from llama_index.core import VectorStoreIndex
+from llama_index.core.retrievers import QueryFusionRetriever
+from llama_index.retrievers.bm25 import BM25Retriever
+from augmentation.bootstrap.configuration.configuration import (
+ AugmentationConfiguration,
+)
+from core.base_factory import Factory
+from embedding.embedding_models.registry import EmbeddingModelRegistry
+from embedding.vector_stores.registry import VectorStoreRegistry
+
+
+class HybridRetrieverFactory(Factory):
+ """
+ Factory class for creating Hybrid (Vector + BM25) retriever instances.
+
+ This factory implements the Factory design pattern to create a hybrid retriever
+ component that uses Query Fusion to combine multiple retrieval results.
+ """
+
+ _configuration_class: Type = AugmentationConfiguration
+
+ @classmethod
+ def _create_instance(
+ cls, configuration: AugmentationConfiguration
+ ) -> QueryFusionRetriever:
+ """
+ Creates a Hybrid retriever instance based on the provided configuration.
+ """
+ vector_store_configuration = configuration.embedding.vector_store
+ vector_store = VectorStoreRegistry.get(vector_store_configuration.name).create(
+ vector_store_configuration
+ )
+
+ embedding_model_config = configuration.embedding.embedding_model
+ embedding_model = EmbeddingModelRegistry.get(embedding_model_config.provider).create(
+ embedding_model_config
+ )
+
+ index = VectorStoreIndex.from_vector_store(
+ vector_store=vector_store, embed_model=embedding_model
+ )
+
+ config = configuration.retriever.configuration
+
+ # In a production RAG system, the BM25 retriever would typically be
+ # initialized with nodes from the document store.
+
+ vector_retriever = index.as_retriever(
+ similarity_top_k=config.similarity_top_k
+ )
+
+ # Initialize BM25 retriever
+ # Note: In LlamaIndex, BM25Retriever needs nodes to be initialized.
+ bm25_retriever = BM25Retriever.from_defaults(
+ nodes=[],
+ similarity_top_k=config.similarity_top_k
+ )
+
+ retriever = QueryFusionRetriever(
+ [vector_retriever, bm25_retriever],
+ similarity_top_k=config.similarity_top_k,
+ num_queries=config.num_queries,
+ mode=config.fusion_mode,
+ use_async=True,
+ )
+
+ return retriever