RAG (Retrieval-Augmented Generation) System

An advanced RAG system developed based on LangGraph, supporting bilingual (Chinese and English) document retrieval. It employs a hybrid retrieval strategy (BM25 + semantic search) and a re-ranking function, offering two operational modes: retrieval-augmented mode and direct conversation mode.

Features

🔄 Hybrid Retrieval: A hybrid retrieval strategy combining BM25 sparse retrieval and semantic search.
🌐 Bilingual Support: Comprehensive support for mixed Chinese and English text preprocessing and retrieval.
🎯 Re-ranking Optimization: Support for the Qwen re-ranking model and the FlashRank re-ranker.
💾 Persistent Storage: Vector database based on ChromaDB, with support for multi-collection management.
🛠️ Tool Integration: An intelligent tool-calling system based on LangGraph.
⚡ Asynchronous Processing: Asynchronous handling of document loading and embedding to improve performance.
🎛️ Configurable: A rich set of environment variable configuration options.

System Architecture

Core Components

Document Processing Module
- PDF document loading.
- Text chunking (with overlap support).
- Bilingual text preprocessing.
Retrieval Module
- BM25 sparse retriever.
- Vector semantic retriever.
- Ensemble Retriever (for hybrid search).
- Re-ranking compressor.
Language Model
- Support for multiple LLM providers.
- Configurable temperature parameter.
- Tool-calling capabilities.
State Management
- State graph based on LangGraph.
- Support for document deduplication.
- Mode switching (retrieval/direct).

Installation and Configuration

Environment Requirements

Python 3.8+.
PyTorch (with optional CUDA support).
The following package dependencies.

Dependency Installation

see requirements.txt

pip install -r requirements.txt

Environment Variable Configuration

Copy and configure the .env-backup file (IGNORE .env). For more detailed configuration information, please refer to the .env-backup file:

# LangSmith Tracing
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your_langsmith_api_key

# Model Configuration
MODEL_PROVIDER=openai
MODEL_NAME=gpt-4o-mini
OPENAI_BASE_URL=your_openai_base_url
OPENAI_API_KEY=your_openai_api_key

# Embedding Model Configuration
EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
OPENAI_EMBEDDING=false
OPENAI_EMBEDDING_BASE_URL=your_openai_embedding_base_url
OPENAI_EMBEDDING_API_KEY=your_openai_embedding_api_key

# Re-ranking Configuration
RERANKER_ENABLED=false
QWEN_RERANKER=false

Usage Instructions

Starting the System

python rag.py

Execution Flow

Initialization Phase
- Load environment variables.
- Initialize the language model and embedding model.
- Create or select a ChromaDB collection.
- Load PDF documents.
Interaction Phase
- Select an operational mode:
  - /retrieve - Retrieval-augmented mode (default).
  - /direct - Direct conversation mode.
- Enter a question to interact.
- Type exit to quit the system.

Document Loading

Supports batch loading of PDF documents:

Enter the path to a PDF file to load it.
Enter done to finish loading.
The system will process document embeddings asynchronously.

Mode Descriptions

Retrieval-Augmented Mode (/retrieve): The system first retrieves relevant documents and then generates an answer based on the retrieval results.
Direct Conversation Mode (/direct): The system answers the question directly, which can be used for tasks like mathematical calculations.

Technical Details

Text Preprocessing

Uses jieba for Chinese word segmentation.
Uses NLTK for English lemmatization.
Supports filtering of both Chinese and English stop words.
Punctuation handling.

Retrieval Strategy

BM25 Retrieval: Keyword-based sparse retrieval.
Semantic Retrieval: Vector similarity-based retrieval.
Hybrid Retrieval: Fuses the results of the two retrieval methods using RRF (Reciprocal Rank Fusion).
Re-ranking: Optional document re-ranking for optimization.

Re-ranker Selection

Qwen Native Reranker: Based on the Qwen3-Reranker-0.6B model.
FlashRank: Based on the ms-marco-MiniLM-L-12-v2 model.
Simple Compressor: Returns only the top N documents (no re-ranking).

Performance Optimization

Asynchronous document processing.
CUDA acceleration (if available).
Parallel task execution.
Document deduplication mechanism.

File Structure

RAG/
├── rag.py                 # Main program file
├── qwen_reranker.py       # Qwen re-ranker implementation
├── simple_compressor.py   # Simple compressor implementation
├── .env                   # Environment variable configuration
├── chroma_langchain_db/   # ChromaDB database directory
└── README.md              # Project documentation

Extension Development

Adding New Tools

Create a new tool using the @tool decorator.
Implement the tool function.
The system will automatically detect and register the tool.

Customizing Preprocessing

Modify the bilingual_preprocess_func function to customize the text preprocessing logic.

Integrating New Embedding Models

Add support for new embedding models in the init_embedding_model function.

Troubleshooting

Common Issues

CUDA Unavailable: The system will automatically fall back to CPU mode.
Document Loading Failure: Check file paths and permissions.
Embedding Model Compatibility: Ensure the embedding model used is compatible with the collection.
Insufficient Memory: Consider reducing the document batch size or using a smaller model.

Debugging Options

Enable LangSmith tracing for debugging.
Check the detailed logs in the console output.
Inspect the ChromaDB collection metadata.

Changelog

v1.0 (Current Version)

Implementation of the basic RAG system.
Hybrid retrieval functionality.
Bilingual support.
Tool-calling integration.
Asynchronous processing optimization.

Contributing

Contributions are welcome! Please submit Issues and Pull Requests to improve the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG (Retrieval-Augmented Generation) System

Features

System Architecture

Core Components

Installation and Configuration

Environment Requirements

Dependency Installation

Environment Variable Configuration

Usage Instructions

Starting the System

Execution Flow

Document Loading

Mode Descriptions

Technical Details

Text Preprocessing

Retrieval Strategy

Re-ranker Selection

Performance Optimization

File Structure

Extension Development

Adding New Tools

Customizing Preprocessing

Integrating New Embedding Models

Troubleshooting

Common Issues

Debugging Options

Changelog

v1.0 (Current Version)

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.env		.env
.env-backup		.env-backup
README-zh.md		README-zh.md
README.md		README.md
qwen_reranker.py		qwen_reranker.py
rag.py		rag.py
requirements.txt		requirements.txt
simple_compressor.py		simple_compressor.py

BreadIceCream/simple-rag

Folders and files

Latest commit

History

Repository files navigation

RAG (Retrieval-Augmented Generation) System

Features

System Architecture

Core Components

Installation and Configuration

Environment Requirements

Dependency Installation

Environment Variable Configuration

Usage Instructions

Starting the System

Execution Flow

Document Loading

Mode Descriptions

Technical Details

Text Preprocessing

Retrieval Strategy

Re-ranker Selection

Performance Optimization

File Structure

Extension Development

Adding New Tools

Customizing Preprocessing

Integrating New Embedding Models

Troubleshooting

Common Issues

Debugging Options

Changelog

v1.0 (Current Version)

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages