History of Nepal Chatbot

Overview

The History of Nepal Chatbot is an intelligent conversational agent designed to provide users with accurate, in-depth, and engaging information about the history of Nepal. It integrates multiple advanced technologies and tools to ensure high-quality responses, leveraging retrieval-augmented generation (RAG) techniques, knowledge graphs, and internet-based searches.

Features

Core Functionalities

Knowledge Sources:
- Qdrant: For dense retrieval using embeddings.
- Knowledge Graph (Neo4j): To explore relationships among historical entities.
- Google Search: To supplement responses with up-to-date information.
Conversation Handling:
- Session-based interactions using LangChain.
- Stores old conversations for context-aware responses.
Custom Agent:
- Orchestrates workflows to retrieve and synthesize data seamlessly.
Data Persistence:
- Query and response storage in PostgreSQL with session IDs and timestamps.

Infrastructure

Fully Dockerized architecture for ease of deployment.
Integration of multiple services:
- Ollama for language modeling.
- LangChain for query handling.
- PostgreSQL for database storage.
- Neo4j for graph-based data representation.
- Qdrant for vector-based search.

Additional Features

Multi-source validation:
- Google Search validates outputs from Qdrant and Neo4j.
- Replaces inconsistent or outdated information with Google search results.
Session Management:
- Tracks and retains user sessions for enhanced interaction.
Time-Stamped Logs:
- Maintains a history of interactions for user reviews and analytics.

Technology Stack

Backend

Ollama: Language model for generating human-like responses.
LangChain: Manages interaction workflows and context handling.

Data Management

Qdrant: Vector database for document retrieval.
Neo4j: Graph database for entity relationships.
PostgreSQL: Stores query-response pairs with session data.

Deployment

Docker: Containers for all components, ensuring portability and scalability.

External Services

Google Search API: Provides supplementary information and validation.

Installation and Setup

Prerequisites

Docker
Docker Compose

Steps

Clone the repository:

git clone <repository_url>
cd <repository_directory>

Create a .env file for environment variables. Example:
- Take reference from env_description which will be required to run this entire project,
Build and start the Docker containers:
```
docker-compose up --build
```
Access the chatbot service via the designated API endpoint or UI (if available).

Run the program

Step 1: Parse the document. Use preprocessing/parse.py which will parse the document using llama parse and store it in markdown format
Step 2: Then to extract the document and chunk them use cmd/extraction_script.py which will chunk the parsed document.
Step 3: To push to the qdrant database and perform embedding and perform hybrid search use cmd/hybrid_search_script.py
Step 4: To push to the neo4j database and perform embedding and perform retrieveral from graph database use cmd/ neo4j_graph_script.py
Step 5: To run agent use agents/custom_agent.py

Technologies used

Llama3.2 3B
langchain agent
neo4j
embedding model (mxbai-embed-large)
sparse embedding (Qdrant/bm25)
NLLB 600M from ctranslate2
rerank cohere

Usage

Ask Historical Questions:
- Example: "What is the significance of the Battle of Kirtipur?"
- The chatbot will retrieve information from Qdrant, Neo4j, and validate with Google Search.
Explore Relationships:
- Example: "What was the relationship between Prithvi Narayan Shah and the British East India Company?"
Session Continuity:
- Engage in conversations where the chatbot retains the context of previous queries.

API Endpoints

1. Chat

Endpoint: /api/chat
Method: POST

Payload:

{
  "query": "What is the history of Lumbini?",
  "session_id": "12345"
}

2. Retrieve Old Conversations

Endpoint: /api/conversations
Method: GET
Query Parameters:
- session_id: Session identifier

System Architecture

User Query:
- Input query received via API or UI.
Retrieval:
- Query sent to Qdrant for vector-based search.
- Query sent to Neo4j for graph traversal.
Validation and Augmentation:
- Google Search API validates and supplements retrieved data.
Response Generation:
- Data synthesized using Ollama and LangChain.
- Response stored in PostgreSQL with session ID and timestamp.
Final Output:
- Context-aware and validated response delivered to the user.

Future Enhancements

Interactive Visualizations:
- Timelines, maps, and graphs for richer user experiences.
Voice Integration:
- Enable voice-based queries and responses.
AI-Driven Insights:
- Predictive analytics and insights about historical trends.

Contributing

Contributions are welcome! Please follow the steps below:

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch-name
```
Commit your changes and open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

The creators of Ollama, LangChain, Qdrant, Neo4j, and PostgreSQL for providing robust tools.
Google Search API for real-time supplementary information.

For any issues or queries, please contact [[email protected]].

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
agents		agents
app		app
cmd		cmd
data		data
preprocessing		preprocessing
prompt_templates		prompt_templates
retrievers		retrievers
translation		translation
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
docker-compose.yaml		docker-compose.yaml
env_description.md		env_description.md
essential_notes.md		essential_notes.md
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

History of Nepal Chatbot

Overview

Features

Core Functionalities

Infrastructure

Additional Features

Technology Stack

Backend

Data Management

Deployment

External Services

Installation and Setup

Prerequisites

Steps

Run the program

Technologies used

Usage

API Endpoints

1. Chat

2. Retrieve Old Conversations

System Architecture

Future Enhancements

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MrBinit/SaharaAI

Folders and files

Latest commit

History

Repository files navigation

History of Nepal Chatbot

Overview

Features

Core Functionalities

Infrastructure

Additional Features

Technology Stack

Backend

Data Management

Deployment

External Services

Installation and Setup

Prerequisites

Steps

Run the program

Technologies used

Usage

API Endpoints

1. Chat

2. Retrieve Old Conversations

System Architecture

Future Enhancements

Contributing

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages