An AI-powered document search and conversational question-answering system that enables users to upload documents and interact with them through a chat-based interface. The system uses semantic search and retrieval-augmented generation (RAG) to ensure accurate, context-aware responses strictly based on the uploaded content.
- Upload and process multiple PDF and TXT documents
- Semantic search using vector embeddings
- Conversational question-answering interface
- Answers restricted strictly to document context
- OCR fallback for scanned or image-based PDFs
- Modern dark UI with smooth hover effects
- Fast and efficient vector search using ChromaDB
- Modular and scalable architecture
- Frontend: Streamlit
- LLM: Meta LLaMA 3.2 Instruct (Hugging Face)
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Vector Database: ChromaDB
- Framework: LangChain
- PDF Processing: PyPDF2, Unstructured
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name- User uploads PDF or TXT documents
- Text is extracted using PyPDF2 (with OCR fallback for scanned PDFs)
- Text is split into smaller chunks
- Vector embeddings are generated using MiniLM
- Embeddings are stored in ChromaDB
- Relevant chunks are retrieved for each user query
- The LLaMA model generates a response strictly from the retrieved context
- Upload one or more documents
- Click Process Documents
- Ask questions using the chat input
- Receive accurate answers based only on document content
If the answer is not found in the documents, the system responds with:
"I don't know"
- Push the project to GitHub
- Go to https://streamlit.io/cloud
- Connect your GitHub repository
- Add
HF_TOKENunder Secrets - Deploy the application
- Academic research and study assistance
- Knowledge-base chatbots
- Document and report analysis
- Legal and policy document exploration
- Resume and portfolio document querying