Skip to content

Chat with any document Text files (.txt, .md), PDF files (.pdf), Word documents (.docx, .doc), JSON files (.json), GeoJSON files (.geojson) using RAG

License

Notifications You must be signed in to change notification settings

aghoshpro/ChatDocument

Repository files navigation

ChatDocument

Streamlit Web App

Retrieval Augmented Generation (RAG) application that allows you to chat with any of your local documents in disparate formats e.g., .txt,.pdf, .md, .docx, .doc, .json,.geojson using Ollama LLMs and LangChain. Upload your document in the Streamlit Web UI for Q&A interaction. Have fun

📂 Project Structure

├── .streamlit/
│   └── config.toml       # Streamlit configuration (OPTIONAL)
├── assets/
│   └── ui.png            # Streamlit UI image
├── components/
│   ├── __init__.py
│   ├── chat.py           # Chat interface implementation
│   └── upload.py         # Document upload handling
├── core/
│   ├── __init__.py
│   ├── embeddings.py     # Vector embeddings configuration
│   └── llm.py            # Language model setup
├── data/
│   ├── vector_store/     # To store vector embeddings in chromadb
│   └── sample_docs/      # Sample documents for testing
├── utils/
│   ├── __init__.py
│   └── helpers.py        # Utility functions
└── main.py               # Application entry point

📚 RAG Architecture

RAG Architecture

Streamlit Web App

✨ Features

  • 📄 Multi document (.txt, .pdf, .md, .docx, .doc, .json) processing with intelligent chunking
  • 🧠 Multi-query retrieval for better context understanding
  • 🎯 Advanced RAG implementation using LangChain and Ollama
  • 🔒 Complete local data processing - no data leaves your machine
  • 📓 Jupyter notebook for experimentation
  • 🖥️ Clean Streamlit UI

🚀 Getting Started

1. Install Ollama

  • Visit Ollama.ai to download Ollama and install

  • Open cmd or terminal and run ollama

  • Install LLM models (locally):

  • Start with ollama pull llama3.2 as it's low sized (4GB) basic llm model tailored for general usecases

  • For vector embeddings pull the following,

    ollama pull mxbai-embed-large # or `nomic-embed-text`
  • Chat with the model in terminal,

    ollama run llama3.2   # or your preferred model
  • Go to Ollama Models to search and pull other famous models as follows,

    ollama pull dolphin3
    ollama pull deepseek-r1:8b
    ollama pull mistral
  • Check the list of locally available ollama models:

    ollama list

2. Clone Repository

  • Open cmd or terminal and navigate to your preferred directory, then run the following,

    git clone https://github.com/aghoshpro/ChatDocument.git
  • Go to the ChatDocument folder using cd ChatDocument

3. Set Up Local Environment

  • Create a virtual environment myvenv inside the ./ChatDocument folder and activate it:

    python -m venv myvenv
    # Windows
    .\myvenv\Scripts\activate    # OR source myvenv/bin/activate (in Linux or Mac)
  • Install dependencies:

    pip install --upgrade -r requirements.txt
  • 🧪 Experiment with code in *.ipynb

    jupyter notebook

🕹️ Run

streamlit run main.py
  • Select llama3.2 as the model and start chatting.

  • Content View Streamlit Web App

  • WordCloud View: Streamlit Web App

🛠 Troubleshooting

  • Ensure Ollama is running in the background
  • GPU preferred for good performance if not CPU (will be slower)
  • ./data/sample_docs contains few sample documents for you to test
  • Use pip list or pip freeze to check currently installed packages

✨Theme Configuration

  • Edit .streamlit/config.toml for your color preferences

    [theme]
    primaryColor = "#FF4B4B"
    backgroundColor = "#0E1117"
    secondaryBackgroundColor = "#262730"
    textColor = "#FAFAFA"
    font = "sans serif"

🤝 Contributing

  • Open issues for bugs or suggestions
  • Submit pull requests

📑 References

Docs

Blogs

Stack Overflow

About

Chat with any document Text files (.txt, .md), PDF files (.pdf), Word documents (.docx, .doc), JSON files (.json), GeoJSON files (.geojson) using RAG

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published