RAGOPS - Production-Ready RAG Pipeline

A production-ready Retrieval-Augmented Generation (RAG) pipeline built with modern technologies, designed for CPU deployment with enterprise-grade features.

🏗️ Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                           RAGOPS ARCHITECTURE                        │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Client    │────│   Nginx     │────│  FastAPI    │────│ Meilisearch │
│ Application │    │  (Optional) │    │  Backend    │    │   Search    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                              │                    │
                                              │              ┌──────────┐
                                              │              │ Document │
                                              │              │ & Chunks │
                                              │              │ Indexes  │
                                              │              └──────────┘
                                              │
                    ┌─────────────┐          │          ┌─────────────┐
                    │   Redis     │──────────┼──────────│  LiteLLM    │
                    │  Caching    │          │          │   Proxy     │
                    └─────────────┘          │          └─────────────┘
                                              │                    │
                                              │              ┌──────────┐
                                              │              │   Groq   │
                                              └──────────────│   LLM    │
                                                            │ Provider │
                    ┌─────────────┐                         └──────────┘
                    │     TEI     │                               │
                    │ Embeddings  │───────────────────────────────┘
                    │  Service    │          
                    └─────────────┘          

Flow:
1. Documents → Ingestion → Chunking → Embeddings → Meilisearch
2. Query → FastAPI → Meilisearch (Hybrid Search) → Context → LLM → Response
3. Redis caches embeddings and responses for performance

🚀 Key Features

Core Capabilities

🔍 Hybrid Search: Combines vector similarity and BM25 text search
📄 Document Processing: Supports multiple document types with intelligent chunking
🧠 LLM Integration: Groq LLMs via LiteLLM proxy with fallback support
⚡ High Performance: Redis caching with 5-50x speed improvements
🎯 Semantic Retrieval: TEI embeddings for semantic understanding
🔧 Production Ready: Docker Compose orchestration with health checks

Technical Features

CPU Optimized: Runs efficiently on CPU-only infrastructure
Scalable Architecture: Microservices design with independent scaling
Enterprise Security: Authentication, authorization, and secure communication
Monitoring & Logging: Comprehensive observability stack
API Documentation: Auto-generated OpenAPI/Swagger documentation

📋 Prerequisites

Docker & Docker Compose: Latest versions
4GB+ RAM: Recommended for optimal performance
API Keys: Groq API key for LLM access
Storage: 2GB+ free disk space for models and indexes

⚡ Quick Start

1. Clone and Configure

git clone <repository-url>
cd RAGOPS

# Copy and configure environment
cp .env.example .env
# Edit .env with your API keys (see Configuration section)

2. Start the Stack

# Start all services
docker-compose up -d

# Check service health
docker-compose ps

# View logs
docker-compose logs -f

3. Verify health

# Check API health
curl http://localhost:18000/health

🔧 Configuration

Environment Variables (.env)

# Meilisearch Configuration
MEILI_KEY=your_secure_master_key_here
MEILI_INDEX=documents
EMBED_DIM=384

# LLM Provider Configuration  
LITELLM_KEY=your_proxy_key_here
GROQ_API_KEY=your_groq_api_key_here

# Optional: Additional LLM providers
OPENAI_API_KEY=your_openai_key_here
HUGGINGFACE_API_KEY=your_hf_key_here

# Service URLs (Docker internal)
MEILI_URL=http://meilisearch:7700
PROXY_URL=http://litellm:4000
REDIS_URL=redis://redis:6379

LiteLLM Model Configuration

Edit litellm/config.yaml to customize:

model_list:
  # Primary chat/completions model on Groq
  - model_name: groq-llama3
    litellm_params:
      model: groq/llama-3.1-8b-instant
      api_key: os.environ/GROQ_API_KEY

  # Local embeddings served by TEI (OpenAI-compatible embeddings API)
  - model_name: local-embeddings
    litellm_params:
      model: openai/text-embedding-ada-002 
      api_key: os.environ/GROQ_API_KEY
      api_base: "http://tei-embeddings:80"
      custom_llm_provider: openai
      timeout: 60

# Global LiteLLM settings
litellm_settings:
  cache: true
  cache_params:
    type: "redis"
    url: "redis://redis:6379"
    ttl: 1800
    supported_call_types: ["completion", "chat_completion", "embedding", "acompletion", "aembedding"]

# Prompt Injection basic guards
prompt_injection_params:
  heuristics_check: true
  similarity_check: false
  vector_db_check: false

# Routing / fallbacks
router_settings:
  fallbacks:
    - "groq-llama3": []

📊 Service Architecture

Core Services

Service	Port	Description	Health Check
FastAPI Backend	18000	Main API server	`GET /health`
Meilisearch	7700	Search & vector database	`GET /health`
LiteLLM Proxy	4000	LLM routing proxy	`GET /health`
TEI Embeddings	80	Text embeddings service	`GET /health`
Redis	6379	Caching layer	TCP check

Data Flow

Document Ingestion:

Documents → FastAPI → Processing → Embeddings (TEI) → Meilisearch

Query Processing:

Query → FastAPI → Embeddings (TEI) → Search (Meilisearch) → Context → LLM (Groq) → Response

Caching Layer:

Redis caches: Embeddings (1h TTL) | LLM Responses (10min TTL)

🧪 Testing & Validation

Health Monitoring

# Check all services
docker-compose ps

# View service logs
docker-compose logs [service-name]

# Monitor resource usage
docker stats

# Test individual components
curl http://localhost:7700/health    # Meilisearch
curl http://localhost:18000/health   # FastAPI Backend

🔧 Development & Customization

Project Structure

.
├── Makefile
├── README.md
├── backend
│   ├── Dockerfile
│   ├── app
│   │   ├── __init__.py
│   │   ├── api
│   │   │   ├── __init__.py
│   │   │   ├── chat.py
│   │   │   ├── embeddings.py
│   │   │   ├── health.py
│   │   │   ├── ingest.py
│   │   │   ├── pdf.py
│   │   │   ├── search.py
│   │   │   └── stats.py
│   │   ├── core
│   │   │   ├── clients.py
│   │   │   ├── config.py
│   │   │   └── logging.py
│   │   ├── main.py
│   │   ├── models
│   │   │   ├── __init__.py
│   │   │   ├── chat.py
│   │   │   ├── documents.py
│   │   │   ├── health.py
│   │   │   ├── responses.py
│   │   │   └── search.py
│   │   ├── services
│   │   │   ├── __init__.py
│   │   │   ├── chunking.py
│   │   │   ├── embeddings.py
│   │   │   ├── ingestion.py
│   │   │   ├── llm_service.py
│   │   │   ├── pdf_processor.py
│   │   │   ├── rag_service.py
│   │   │   └── search_service.py
│   │   └── utils
│   │       ├── __init__.py
│   │       ├── cache.py
│   │       └── hashing.py
│   ├── requirements.txt
│   └── seed_data.py
├── docker-compose.yml
├── litellm
│   └── config.yaml
├── pdf_files
│   ├── autoencoders.pdf
│   ├── linear_algebra.pdf
│   └── linear_factor_models.pdf
├── scripts
│   └── meili-init.sh
└── tests
    ├── chunking_validation.py
    ├── debug_vector_search.py
    ├── demo_phase2.py
    ├── demo_working_features.py
    ├── final_rag_test_report.py
    ├── test_all_features.py
    ├── test_direct_ingest.py
    └── test_phase2_comprehensive.py

📈 Production Deployment

Scaling Considerations

Horizontal Scaling:

# In docker-compose.yml
backend:
  deploy:
    replicas: 3

redis:
  deploy:
    replicas: 1  # Redis should remain single instance

Resource Allocation:

services:
  backend:
    deploy:
      resources:
        limits:
          memory: 2G
          cpus: '1.0'

Data Persistence:

volumes:
  meili_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /data/meilisearch

Next Evolution: Advanced Document Processing and Search Enhancement

🎯 Overview

Phase 3 will extend RAGOPS with advanced document processing capabilities and search result reranking to create a comprehensive enterprise-grade RAG system.

Current Status

✅ Phase 1: Text-based chunking and ingestion
✅ Phase 2: Embeddings integration with semantic search
🎯 Phase 3: PDF processing + reranking (this document)

🏗️ Architecture Enhancement

New Components

┌─────────────────┐    ┌──────────────┐    ┌────────────────┐
│   PDF Upload    │───▶│   LangChain  │───▶│   Existing     │
│   Interface     │    │   Processor  │    │   Pipeline     │
└─────────────────┘    └──────────────┘    └────────────────┘
                              │
                              ▼
                       ┌──────────────┐
                       │  Cross-      │
                       │  Encoder     │
                       │  Reranker    │
                       └──────────────┘

Enhanced Data Flow

PDF Upload → LangChain PyPDFLoader → Page extraction
Text Processing → RecursiveCharacterTextSplitter → Smart chunking
Metadata Enrichment → Page numbers, file info, structure
Existing Pipeline → Embeddings → Meilisearch storage
Enhanced Search → Initial retrieval → Cross-encoder reranking → Final results

🎯 Quick Start

1. Start the System

make up

This builds images, starts all services, and waits for readiness

2. Validate Installation

make test

Runs comprehensive Phase 2 validation suite

3. Try a Demo

make demo

Interactive demonstration of key features

4. Check All Features

make validate

Complete system validation and feature testing

🛠️ Available Commands

Run make help to see all available commands:

make help

Core Operations

make up - Start all RAGOPS services
make down - Stop all services
make restart - Restart all services
make logs - Show backend service logs
make clean - Clean up Docker resources

Testing & Validation

make test - Run Phase 2 comprehensive tests
make demo - Run feature demonstrations
make validate - Validate all system features

Development

make dev-logs - Follow all service logs
make dev-rebuild - Rebuild and restart backend only
make dev-reset - Complete system reset with fresh data

🏗️ System Architecture

┌─────────────────┐    ┌──────────────┐    ┌────────────────┐
│   Frontend      │───▶│   Backend    │───▶│  Meilisearch   │
│   (Future)      │    │   FastAPI    │    │   + Vector     │
└─────────────────┘    └──────────────┘    └────────────────┘
                              │                       │
                              ▼                       ▼
                       ┌──────────────┐    ┌────────────────┐
                       │   LiteLLM    │    │     Redis      │
                       │   Proxy      │    │    Cache       │
                       └──────────────┘    └────────────────┘
                              │
                              ▼
                       ┌──────────────┐    ┌────────────────┐
                       │     TEI      │    │     Groq       │
                       │ Embeddings   │    │     LLM        │
                       └──────────────┘    └────────────────┘

Services Overview

Backend (Port 18000): FastAPI with Phase 2 embeddings
Meilisearch (Port 7700): Vector-enabled search engine
TEI-Embeddings (Port 8080): Text embeddings inference
LiteLLM (Port 4000): Multi-provider LLM proxy
Redis (Port 6379): Embedding cache layer
Meili-Init: Automated index configuration

📡 API Reference

Core Endpoints

Health Check

curl -s http://localhost:18000/health | jq .

Response:

{
  "status": "healthy",
  "embeddings_available": true,
  "embedding_dimensions": 384
}

Document Ingestion

curl -X POST "http://localhost:18000/ingest" \
  -H "Content-Type: application/json" \
  -d '[{
    "id": "doc1",
    "text": "Your document content here...",
    "metadata": {"title": "Document Title", "category": "docs"}
  }]' | jq .

Semantic Search & RAG

curl -X POST "http://localhost:18000/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "What are vector embeddings?", "k": 5}' | jq .

Test Embeddings

curl -X POST "http://localhost:18000/test-embeddings" \
  -H "Content-Type: application/json" \
  -d '["text to embed", "another text"]' | jq .

Initialize Indexes

curl -X POST "http://localhost:18000/init-index"

🔧 Configuration

Environment Setup

Create a .env file with your configuration:

# Required: Groq API Key
GROQ_API_KEY=your_groq_api_key_here

# Meilisearch Configuration  
MEILI_KEY=change_me_master_key

# Performance Settings (optional)
REDIS_CACHE_TTL=3600
MAX_CHUNK_SIZE=500
CHUNK_OVERLAP=50

Service Configuration

All service URLs are automatically configured for Docker Compose:

PROXY_URL=http://litellm:4000
MEILISEARCH_URL=http://meilisearch:7700
REDIS_URL=redis://redis:6379
TEI_URL=http://tei-embeddings:8080

🧪 Testing & Validation

Using Makefile Commands

Quick Validation

make test

Runs comprehensive Phase 2 test suite

Full System Validation

make validate

Tests all features and generates detailed reports

Test Files Organization

All tests are in the tests/ directory:

tests/test_phase2_comprehensive.py - Complete Phase 2 validation
tests/test_all_features.py - Comprehensive feature testing
tests/chunking_validation.py - Text chunking validation

Check System Status

# After make up, check services
docker compose ps

# Check resource usage
docker stats

Maintenance Tasks

Clean Docker Resources

make clean

Removes containers, volumes, and prunes system

Manual Cache Clear

docker compose exec redis redis-cli FLUSHALL

Backup Data

# Backup documents
curl -H "Authorization: Bearer $MEILI_KEY" \
  "http://localhost:7700/indexes/documents/documents" > backup_documents.json

# Backup chunks  
curl -H "Authorization: Bearer $MEILI_KEY" \
  "http://localhost:7700/indexes/chunks/documents" > backup_chunks.json

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
backend		backend
litellm		litellm
pdf_files		pdf_files
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

DataScientest/MLOps-RagOps-Cours

Folders and files

Latest commit

History

Repository files navigation

RAGOPS - Production-Ready RAG Pipeline

🏗️ Architecture Overview

🚀 Key Features

Core Capabilities

Technical Features

📋 Prerequisites

⚡ Quick Start

1. Clone and Configure

2. Start the Stack

3. Verify health

🔧 Configuration

Environment Variables (.env)

LiteLLM Model Configuration

📊 Service Architecture

Core Services

Data Flow

🧪 Testing & Validation

Health Monitoring

🔧 Development & Customization

Project Structure

📈 Production Deployment

Scaling Considerations

Next Evolution: Advanced Document Processing and Search Enhancement

🎯 Overview

Current Status

🏗️ Architecture Enhancement

New Components

Enhanced Data Flow

🎯 Quick Start

1. Start the System

2. Validate Installation

3. Try a Demo

4. Check All Features

🛠️ Available Commands

Core Operations

Testing & Validation

Development

🏗️ System Architecture

Services Overview

📡 API Reference

Core Endpoints

Health Check

Document Ingestion

Semantic Search & RAG

Test Embeddings

Initialize Indexes

🔧 Configuration

Environment Setup

Service Configuration

🧪 Testing & Validation

Using Makefile Commands

Quick Validation

Full System Validation

Test Files Organization

Check System Status

Maintenance Tasks

Clean Docker Resources

Manual Cache Clear

Backup Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages