A comprehensive Retrieval-Augmented Generation (RAG) laboratory for testing different combinations of Large Language Models (LLMs) and embedding models. This platform provides both a WebUI interface and OpenAI-compatible API access for experimentation and development.
WebUI βββΊ Router βββΊ ββ Milvus (Vector Database)
ββ Models (LLMs)
ββ Embeddings (Vector Models)
The system uses a router-based architecture where:
- WebUI: User-friendly interface for interactive testing
- Router: Central component that manages model selection and request routing
- Milvus: Vector database for storing and retrieving document embeddings
- Models: Large Language Models (Granite, Llama, etc.)
- Embeddings: Vector embedding models for document processing
- ArgoCD must be installed and running in your OpenShift cluster
- MinIO must be deployed and accessible (required for document storage)
- GPU nodes properly labeled for model deployment
- OpenShift/Kubernetes cluster with appropriate resources
-
Clone the repository
git clone https://github.com/alpha-hack-program/rag-base.git cd rag-base -
Configure environment
cp env.sample .env # Edit .env with your specific configuration -
Deploy using ArgoCD
./bootstrap/deploy.sh
The .env file is mandatory for deployment. Copy env.sample to .env and adapt the following key sections:
- ArgoCD Namespace: Where ArgoCD is installed
- Project Namespace: Target namespace for RAG deployment
- Git Repository: Repository URL and branch
- MinIO Credentials: Access key, secret key, and endpoint
MinIO is required for document storage and pipeline artifacts:
MINIO_ACCESS_KEY=your_access_key
MINIO_SECRET_KEY=your_secret_key
MINIO_ENDPOINT=minio.your-namespace.svc:9000NODE_SELECTOR_KEY=group
NODE_SELECTOR_VALUE=rag-base
GPU_TYPE=nvidia.com/gpuLabel your nodes with the configured selectors:
# Label nodes for general workloads
oc label node <node-name> group=rag-base
# Label GPU nodes for embedding models
oc label node <gpu-node-name> modelType=embedding
# Label GPU nodes for specific model types
oc label node <gpu-node-name> model=granite
oc label node <gpu-node-name> model=llamaThe deployment script includes convenient labels for node selection:
modelType: 'embedding'- For embedding model workloadsmodel: 'granite'- For Granite LLM workloadsmodel: 'llama'- For Llama LLM workloads
Either adapt these labels in the deployment configuration or label your nodes accordingly!
- Granite 3.3 8B:
granite-3-3-8b- Model:
ibm-granite/granite-3.3-8b-instruct - Features: Tool calling, enhanced instruction following
- Model:
- Llama 3.1 8B:
llama-3-1-8b-w4a16- Model:
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 - Features: Quantized for efficiency, tool calling
- Model:
- Multilingual E5 Large:
multilingual-e5-large-gpu- Model:
intfloat/multilingual-e5-large - Dimensions: 1024, Max tokens: 512
- Model:
- BGE M3:
bge-m3-gpu- Model:
BAAI/bge-m3 - Dimensions: 1024, Max tokens: 8192
- Model:
- Jina Embeddings v3:
jina-embeddings-v3-gpu- Model:
jinaai/jina-embeddings-v3 - Dimensions: 1024, Max tokens: 8192
- Model:
Important: There is no default embeddings model. Instead, the system uses composed model names that allow you to select both the LLM and embedding model simultaneously through the router.
The router exposes endpoints with combined model and embedding identifiers:
- Format:
{llm-model}+{embedding-model} - Example:
granite-3-3-8b+multilingual-e5-large-gpu
This approach allows you to test different combinations:
granite-3-3-8bwithbge-m3-gpullama-3-1-8b-w4a16withjina-embeddings-v3-gpu- And any other combination
- User-friendly web interface for interactive testing
- Document upload and management
- Real-time model comparison
- Visual feedback and results
Access the RAG system programmatically using OpenAI SDK:
from openai import OpenAI
# Configure client to use RAG router
client = OpenAI(
base_url="http://router-endpoint/v1",
api_key="not-needed"
)
# Use composed model names for selection
response = client.chat.completions.create(
model="granite-3-3-8b+multilingual-e5-large-gpu",
messages=[
{"role": "user", "content": "Your question here"}
]
)Milvus is automatically deployed as the vector database for this RAG system:
- Stores document embeddings for retrieval
- Supports multiple collections for different embedding models
- Provides efficient similarity search
- Includes Attu web interface for database management
- Router: Central API gateway with OpenAI-compatible interface
- WebUI: Interactive web application for testing
- Milvus: Vector database for embeddings storage
- LSD (Llama Stack Distribution): Model serving infrastructure
- MCP Servers: Model Context Protocol servers for enhanced capabilities
- Kubeflow Pipelines: Document processing and embedding generation
- S3 Storage: Document and artifact storage via MinIO
- Vector Processing: Automatic chunking and embedding generation
-
Node Selection Errors
- Ensure nodes are labeled with the configured selectors
- Verify GPU availability and labels
- Check node selector configuration in
.env
-
MinIO Connection Issues
- Verify MinIO credentials and endpoint
- Ensure MinIO is accessible from the cluster
- Check bucket creation and permissions
-
ArgoCD Application Errors
- Confirm ArgoCD is running and accessible
- Verify repository URL and credentials
- Check namespace permissions
# Check ArgoCD application status
oc get applications -n openshift-gitops
# View router logs
oc logs -n <namespace> deployment/router
# Check model serving status
oc get inferenceservices -n <namespace>- Update
values.yamlingitops/rag-base/ - Add model configuration with appropriate resource requirements
- Configure node selectors and GPU requirements
- Update router model mappings
Prompts are managed through ConfigMaps and can be customized:
- Edit prompt templates in router configuration
- Support for context-aware and context-free prompts
- Dynamic prompt injection for different use cases
The repository includes sample documents in multiple languages:
- English:
/examples/documents/en/ - Spanish:
/examples/documents/es/ - Portuguese:
/examples/documents/pt/ - LaTeX sources:
/examples/documents/latex/
- Fork the repository
- Create a feature branch
- Test your changes with the full deployment
- Submit a pull request
This project is part of the Alpha Hack Program. See individual component licenses for details.
- Ensure proper GPU node labeling before deployment
- MinIO is a hard requirement - the system will not function without it
- The namespace configuration in
.envis critical for proper component communication - ArgoCD must be pre-installed and configured
- Test model combinations thoroughly to find optimal performance for your use cases