Retrieval Augmented Generation (RAG) application that allows you to chat with any of your local documents in disparate formats e.g., .txt
,.pdf
, .md
, .docx
, .doc
, .json
,.geojson
using Ollama LLMs and LangChain. Upload your document in the Streamlit Web UI for Q&A interaction. Have fun
├── .streamlit/
│ └── config.toml # Streamlit configuration (OPTIONAL)
├── assets/
│ └── ui.png # Streamlit UI image
├── components/
│ ├── __init__.py
│ ├── chat.py # Chat interface implementation
│ └── upload.py # Document upload handling
├── core/
│ ├── __init__.py
│ ├── embeddings.py # Vector embeddings configuration
│ └── llm.py # Language model setup
├── data/
│ ├── vector_store/ # To store vector embeddings in chromadb
│ └── sample_docs/ # Sample documents for testing
├── utils/
│ ├── __init__.py
│ └── helpers.py # Utility functions
└── main.py # Application entry point
- 📄 Multi document (
.txt
,.pdf
,.md
,.docx
,.doc
,.json
) processing with intelligent chunking - 🧠 Multi-query retrieval for better context understanding
- 🎯 Advanced RAG implementation using LangChain and Ollama
- 🔒 Complete local data processing - no data leaves your machine
- 📓 Jupyter notebook for experimentation
- 🖥️ Clean Streamlit UI
-
Visit Ollama.ai to download Ollama and install
-
Open
cmd
orterminal
and runollama
-
Install LLM models (locally):
-
Start with
ollama pull llama3.2
as it's low sized (4GB) basic llm model tailored for general usecases -
For vector embeddings pull the following,
ollama pull mxbai-embed-large # or `nomic-embed-text`
-
Chat with the model in
terminal
,ollama run llama3.2 # or your preferred model
-
Go to Ollama Models to search and pull other famous models as follows,
ollama pull dolphin3 ollama pull deepseek-r1:8b ollama pull mistral
-
Check the list of locally available ollama models:
ollama list
-
Open
cmd
orterminal
and navigate to your preferred directory, then run the following,git clone https://github.com/aghoshpro/ChatDocument.git
-
Go to the ChatDocument folder using
cd ChatDocument
-
Create a virtual environment
myvenv
inside the./ChatDocument
folder and activate it:python -m venv myvenv
# Windows .\myvenv\Scripts\activate # OR source myvenv/bin/activate (in Linux or Mac)
-
Install dependencies:
pip install --upgrade -r requirements.txt
-
🧪 Experiment with code in
*.ipynb
jupyter notebook
streamlit run main.py
- Ensure Ollama is running in the background
- GPU preferred for good performance if not CPU (will be slower)
./data/sample_docs
contains few sample documents for you to test- Use
pip list
orpip freeze
to check currently installed packages
-
Edit
.streamlit/config.toml
for your color preferences[theme] primaryColor = "#FF4B4B" backgroundColor = "#0E1117" secondaryBackgroundColor = "#262730" textColor = "#FAFAFA" font = "sans serif"
- Open issues for bugs or suggestions
- Submit pull requests
- LangChain
- Ollama
- ChromaDB
- Streamlit
- Folium
- Unstructured
- ChromaDB Tutorial Step by Step Guide
- ChromaDB Collections