Skip to content

A powerful Retrieval-Augmented Generation (RAG) application built with LangChain and Streamlit that enables intelligent question-answering from web-based documents.

Notifications You must be signed in to change notification settings

anujayavidmal2002/RAG---DocumentURL-QandA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– RAG LangChain Master

Python Streamlit LangChain Streamlit App Deployment Status GitHub Actions License

A sophisticated Retrieval-Augmented Generation (RAG) system that transforms any web document into an intelligent knowledge base. Ask questions and get accurate, contextual answers powered by cutting-edge AI technology.

πŸŽ₯ Demo & Preview

πŸ“Ή Hosted Web App

πŸ”— Live App on Streamlit

πŸ“Ή Live Demo Video

RAG Demo Video

Click the image above to watch the full demo on YouTube

πŸ“Έ Application Preview

RAG Application Interface
RAG Application Interface

Interactive web interface for document loading and AI-powered question answering


✨ Key Features

🌐 Universal Web Scraping - Load documents from any URL
🧠 Smart Document Processing - Intelligent text chunking with overlap
πŸ” Semantic Search - Vector-based similarity search
πŸ’¬ GPT-4 Integration - State-of-the-art answer generation
πŸ‘οΈ Full Transparency - View source documents for every answer
⚑ Real-time Processing - Instant document indexing and querying
🎨 Modern UI - Clean, responsive Streamlit interface

πŸš€ Quick Start

1. Clone & Setup

git clone https://github.com/yourusername/rag-langchain-master.git
cd rag-langchain-master

2. Environment Setup

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure API Key

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here

4. Launch Application

streamlit run apps/web_rag.py

Navigate to http://localhost:8501 and start asking questions! πŸŽ‰

πŸ“Š How It Works

graph LR
    A[Web URL] --> B[Document Loader]
    B --> C[Text Splitter]
    C --> D[Embeddings]
    D --> E[Vector Store]
    F[User Question] --> G[Retriever]
    G --> E
    E --> H[Relevant Chunks]
    H --> I[GPT-4]
    I --> J[Final Answer]
Loading
  1. πŸ“„ Document Ingestion: WebBaseLoader extracts content from URLs
  2. βœ‚οΈ Text Chunking: Smart splitting with configurable overlap
  3. πŸ”’ Vectorization: HuggingFace embeddings create semantic representations
  4. πŸ—„οΈ Storage: In-memory vector database for lightning-fast retrieval
  5. πŸ” Retrieval: Semantic search finds most relevant content
  6. πŸ€– Generation: GPT-4 synthesizes accurate answers with context

πŸ› οΈ Tech Stack

Component Technology Purpose
Framework LangChain LLM application orchestration
Frontend Streamlit Interactive web interface
LLM OpenAI GPT-4 Answer generation
Embeddings HuggingFace Transformers Semantic text representation
Vector DB In-Memory Store Fast similarity search
Loader WebBaseLoader Document extraction

πŸ“ Project Structure

rag-langchain-master/
β”œβ”€β”€ πŸ“± apps/
β”‚   └── web_rag.py              # Main Streamlit application
β”œβ”€β”€ πŸ“‹ requirements.txt         # Python dependencies
β”œβ”€β”€ πŸ”’ .env                     # Environment variables
β”œβ”€β”€ πŸ“– README.md               # This file
β”œβ”€β”€ πŸ“„ LICENSE                 # MIT license
└── πŸ“Έ screenshots/            # Demo images
    └── app-preview.png

βš™οΈ Configuration Options

Text Splitting Parameters

RecursiveCharacterTextSplitter(
    chunk_size=1000,           # Characters per chunk
    chunk_overlap=200,         # Overlap between chunks
    separators=["\n\n", "\n", " ", ""]  # Split priorities
)

Embedding Model

HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

LLM Configuration

ChatOpenAI(model_name="gpt-4")  # Configurable model

🎯 Use Cases

  • πŸ“š Research Assistant: Query academic papers and documentation
  • πŸ“° News Analysis: Extract insights from news articles
  • πŸ“‹ Policy Documents: Navigate complex legal/policy texts
  • 🏒 Corporate Knowledge: Build internal knowledge bases
  • πŸ“– Educational Content: Interactive learning from web resources

πŸ”§ Advanced Features

Custom Document Types

Extend the loader to support PDFs, Word docs, and more:

from langchain_community.document_loaders import PyPDFLoader
# Implementation details...

Persistent Storage

Upgrade to persistent vector databases:

from langchain_community.vectorstores import Chroma
# Implementation details...

Multi-Model Support

Switch between different LLMs:

from langchain_community.llms import Ollama
# Implementation details...

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. 🍴 Fork the repository
  2. 🌿 Create a feature branch (git checkout -b feature/amazing-feature)
  3. πŸ’« Commit your changes (git commit -m 'Add amazing feature')
  4. πŸš€ Push to the branch (git push origin feature/amazing-feature)
  5. πŸ“¬ Open a Pull Request

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add docstrings to all functions
  • Include unit tests for new features
  • Update documentation as needed

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Troubleshooting

Common Issues

ImportError: No module named 'streamlit'

pip install -r requirements.txt

OpenAI API Key Error

# Ensure .env file exists with valid API key
echo "OPENAI_API_KEY=your_key_here" > .env

Performance Issues

  • Use smaller chunk sizes for faster processing
  • Consider using lighter embedding models
  • Implement caching for frequently accessed documents

πŸ™ Acknowledgments

  • 🦜 LangChain - Powerful LLM framework
  • πŸ€– OpenAI - GPT-4 API access
  • πŸ€— HuggingFace - Open-source transformers
  • 🎈 Streamlit - Rapid web app development
  • 🌟 Open Source Community - Continuous inspiration

About

A powerful Retrieval-Augmented Generation (RAG) application built with LangChain and Streamlit that enables intelligent question-answering from web-based documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published