A full-stack Retrieval-Augmented Generation (RAG) system that lets you upload your own knowledge base and chat with it using Google Gemini. Built with LangChain, MongoDB Atlas Vector Search, and Streamlit.
- Vector Store — MongoDB Atlas Vector Search stores and retrieves document embeddings
- Embeddings — Powered by
sentence-transformers/all-mpnet-base-v2(768 dimensions) - LLM — Google Gemini
gemini-2.5-flashfor natural language responses - Vector Visualization — PCA-based 2D semantic distance map using Plotly
- Framework — Built with LangChain + Streamlit
rag_template/
├── .streamlit/
│ └── secrets.toml # API keys (never commit this!)
├── assets/
│ ├── chatbot_homepage.png
│ ├── mongo_db_data.png
│ └── semantic_distance_map.png
├── pages/
│ └── vector_graph.py # Vector visualization page
├── backend.py # Core RAG logic
├── home.py # Streamlit chat UI
└── requirements.txt
- Python 3.8+
- A MongoDB Atlas cluster with Vector Search enabled
- A Google AI Studio API key (Gemini)
1. Clone the repository:
git clone <your-repo-url>
cd rag_template2. Create and activate a virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate3. Install dependencies:
pip install -r requirements.txtCreate a .streamlit/secrets.toml file in the project root:
MONGO_URI = "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/?retryWrites=true&w=majority"
GEMINI_API_KEY = "your-gemini-api-key"
⚠️ Never commitsecrets.tomlto GitHub. Make sure.gitignoreincludes.streamlit/secrets.toml
- Create a collection:
vector_store_database.embeddings_stream - Create a Vector Search Index named
vector_indexwith this definition:
{
"fields": [
{
"numDimensions": 768,
"path": "embedding",
"similarity": "cosine",
"type": "vector"
}
]
}User Input Text
↓
HuggingFace Embeddings (768-dim vectors)
↓
MongoDB Atlas Vector Store
↓
User Query → Similarity Search (top 3 docs)
↓
Context + Query → Gemini LLM
↓
Answer + Sources
| Function | Description |
|---|---|
get_vector_store() |
Connects to MongoDB and loads the embedding model |
ingest_text(text) |
Converts text to vector and stores in MongoDB |
get_rag_response(query) |
Retrieves top 3 similar docs and generates a Gemini answer |
get_vectors_for_visualization(query) |
Returns vectors for PCA plotting |
streamlit run home.pyOpen http://localhost:8501 in your browser.
- Paste your knowledge in the sidebar → click Upload to MongoDB
- Ask questions in the chat input
- Visit the Vector Visualization page to explore semantic similarity
| Tool | Purpose |
|---|---|
| Streamlit | Frontend UI |
| LangChain | RAG pipeline orchestration |
| MongoDB Atlas | Vector store |
| Google Gemini | Language model |
| HuggingFace | Sentence embeddings |
| Plotly + PCA | Vector visualization |
MIT License


