🤖 RAG LangChain Master

A sophisticated Retrieval-Augmented Generation (RAG) system that transforms any web document into an intelligent knowledge base. Ask questions and get accurate, contextual answers powered by cutting-edge AI technology.

🎥 Demo & Preview

📹 Hosted Web App

🔗 Live App on Streamlit

📹 Live Demo Video

RAG Demo Video

Click the image above to watch the full demo on YouTube

📸 Application Preview

Interactive web interface for document loading and AI-powered question answering

✨ Key Features

🌐 Universal Web Scraping - Load documents from any URL
🧠 Smart Document Processing - Intelligent text chunking with overlap
🔍 Semantic Search - Vector-based similarity search
💬 GPT-4 Integration - State-of-the-art answer generation
👁️ Full Transparency - View source documents for every answer
⚡ Real-time Processing - Instant document indexing and querying
🎨 Modern UI - Clean, responsive Streamlit interface

🚀 Quick Start

1. Clone & Setup

git clone https://github.com/yourusername/rag-langchain-master.git
cd rag-langchain-master

2. Environment Setup

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate
# Activate (macOS/Linux)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configure API Key

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key_here

4. Launch Application

streamlit run apps/web_rag.py

Navigate to http://localhost:8501 and start asking questions! 🎉

📊 How It Works

graph LR
    A[Web URL] --> B[Document Loader]
    B --> C[Text Splitter]
    C --> D[Embeddings]
    D --> E[Vector Store]
    F[User Question] --> G[Retriever]
    G --> E
    E --> H[Relevant Chunks]
    H --> I[GPT-4]
    I --> J[Final Answer]

📄 Document Ingestion: WebBaseLoader extracts content from URLs
✂️ Text Chunking: Smart splitting with configurable overlap
🔢 Vectorization: HuggingFace embeddings create semantic representations
🗄️ Storage: In-memory vector database for lightning-fast retrieval
🔍 Retrieval: Semantic search finds most relevant content
🤖 Generation: GPT-4 synthesizes accurate answers with context

🛠️ Tech Stack

Component	Technology	Purpose
Framework	LangChain	LLM application orchestration
Frontend	Streamlit	Interactive web interface
LLM	OpenAI GPT-4	Answer generation
Embeddings	HuggingFace Transformers	Semantic text representation
Vector DB	In-Memory Store	Fast similarity search
Loader	WebBaseLoader	Document extraction

📁 Project Structure

rag-langchain-master/
├── 📱 apps/
│   └── web_rag.py              # Main Streamlit application
├── 📋 requirements.txt         # Python dependencies
├── 🔒 .env                     # Environment variables
├── 📖 README.md               # This file
├── 📄 LICENSE                 # MIT license
└── 📸 screenshots/            # Demo images
    └── app-preview.png

⚙️ Configuration Options

Text Splitting Parameters

RecursiveCharacterTextSplitter(
    chunk_size=1000,           # Characters per chunk
    chunk_overlap=200,         # Overlap between chunks
    separators=["\n\n", "\n", " ", ""]  # Split priorities
)

Embedding Model

HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

LLM Configuration

ChatOpenAI(model_name="gpt-4")  # Configurable model

🎯 Use Cases

📚 Research Assistant: Query academic papers and documentation
📰 News Analysis: Extract insights from news articles
📋 Policy Documents: Navigate complex legal/policy texts
🏢 Corporate Knowledge: Build internal knowledge bases
📖 Educational Content: Interactive learning from web resources

🔧 Advanced Features

Custom Document Types

Extend the loader to support PDFs, Word docs, and more:

from langchain_community.document_loaders import PyPDFLoader
# Implementation details...

Persistent Storage

Upgrade to persistent vector databases:

from langchain_community.vectorstores import Chroma
# Implementation details...

Multi-Model Support

Switch between different LLMs:

from langchain_community.llms import Ollama
# Implementation details...

🤝 Contributing

We welcome contributions! Here's how to get started:

🍴 Fork the repository
🌿 Create a feature branch (git checkout -b feature/amazing-feature)
💫 Commit your changes (git commit -m 'Add amazing feature')
🚀 Push to the branch (git push origin feature/amazing-feature)
📬 Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Add docstrings to all functions
Include unit tests for new features
Update documentation as needed

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Troubleshooting

Common Issues

ImportError: No module named 'streamlit'

pip install -r requirements.txt

OpenAI API Key Error

# Ensure .env file exists with valid API key
echo "OPENAI_API_KEY=your_key_here" > .env

Performance Issues

Use smaller chunk sizes for faster processing
Consider using lighter embedding models
Implement caching for frequently accessed documents

🙏 Acknowledgments

🦜 LangChain - Powerful LLM framework
🤖 OpenAI - GPT-4 API access
🤗 HuggingFace - Open-source transformers
🎈 Streamlit - Rapid web app development
🌟 Open Source Community - Continuous inspiration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 RAG LangChain Master

🎥 Demo & Preview

📹 Hosted Web App

📹 Live Demo Video

📸 Application Preview

✨ Key Features

🚀 Quick Start

1. Clone & Setup

2. Environment Setup

3. Configure API Key

4. Launch Application

📊 How It Works

🛠️ Tech Stack

📁 Project Structure

⚙️ Configuration Options

Text Splitting Parameters

Embedding Model

LLM Configuration

🎯 Use Cases

🔧 Advanced Features

Custom Document Types

Persistent Storage

Multi-Model Support

🤝 Contributing

Development Guidelines

📝 License

🆘 Troubleshooting

Common Issues

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
apps		apps
basics		basics
generation		generation
indexing		indexing
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
git		git
master		master
requirements.txt		requirements.txt

anujayavidmal2002/RAG---DocumentURL-QandA

Folders and files

Latest commit

History

Repository files navigation

🤖 RAG LangChain Master

🎥 Demo & Preview

📹 Hosted Web App

📹 Live Demo Video

📸 Application Preview

✨ Key Features

🚀 Quick Start

1. Clone & Setup

2. Environment Setup

3. Configure API Key

4. Launch Application

📊 How It Works

🛠️ Tech Stack

📁 Project Structure

⚙️ Configuration Options

Text Splitting Parameters

Embedding Model

LLM Configuration

🎯 Use Cases

🔧 Advanced Features

Custom Document Types

Persistent Storage

Multi-Model Support

🤝 Contributing

Development Guidelines

📝 License

🆘 Troubleshooting

Common Issues

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages