Encyclopaedic Polyglot Machine (EPM)

A multilingual RAG (Retrieval-Augmented Generation) chatbot system that provides encyclopedic knowledge across multiple languages using local LLM inference.

About the Project

The Encyclopaedic Polyglot Machine (EPM) is designed to provide accurate, encyclopedic information across multiple languages without requiring an internet connection or API keys. It uses local LLM inference combined with specialized vector databases for each supported language, enabling users to get high-quality information in their preferred language.

Key capabilities include:

Answering factual questions with proper citations
Switching between languages seamlessly
Maintaining conversation context across language switches
Admin interface for managing documents and users
Local operation with no data sent to external services

Tech Stack

Frontend: Flask templates with responsive design
Backend: Python Flask server
LLM: Llama models via Ollama for local inference
Vector Database: ChromaDB for efficient document retrieval
Embeddings: Multilingual MiniLM for cross-language semantic search
Document Processing: PDF, DOCX, and TXT processing pipeline
User Management: SQLite database for authentication and chat history
Containerization: Docker for cross-platform deployment

Features

Multilingual support with language-specific vector databases
Local LLM inference using Ollama and Llama models
Persistent chat history and user management
Admin dashboard for system management and document uploads
Vector search across encyclopedic knowledge sources
Wikipedia integration for additional knowledge retrieval

Installation Options

You can run the EPM system using one of these two methods:

Option 1: Using Docker Hub (Recommended)

The fastest way to get started with minimal setup:

# Pull the image directly from Docker Hub
docker pull eishaenan/polyglot_app:latest

# Run the container
docker run -p 5001:5000 -p 11434:11434 \
  -v polyglot_data:/app/data \
  -v polyglot_chroma:/app/chroma_db \
  -v polyglot_ollama:/root/.ollama \
  eishaenan/polyglot_app:latest

This will:

Pull the pre-built Docker image with all dependencies
Start Ollama and download the Llama model if needed (~8GB download on first run)
Initialize the authentication database with default admin user
Build the vector database from the data sources
Start the Flask web application

Note: The initial startup may take several minutes as it downloads the model and builds the vector database. Docker performance may be slower than native installation as LLM inference is limited to CPU only in containers.

Access the Application:

Web Interface: http://localhost:5001
Default Admin Credentials:
- Username: admin
- Password: admin123

Option 2: Local Python Environment

If you prefer to run without Docker:

Clone the Repository

git clone https://github.com/wsu-comp3018/final-system-pa2509.git
cd final-system-pa2509

Create and Activate a Virtual Environment

python3 -m venv venv
source venv/bin/activate  # Mac/Linux
venv\Scripts\activate     # Windows (Command Prompt)

Install Dependencies
```
pip3 install -r requirements.txt
```
Install Ollama (If Not Installed Yet)
- Download from Ollama.com
- For Linux:
```
curl -fsSL https://ollama.com/install.sh | sh
```
Start Ollama in a Separate Terminal
```
ollama serve
```
Pull the Required Model
```
ollama pull llama3.1:8b-instruct-q8_0
```
Note: This will download approximately 4GB of data and may take some time depending on your internet connection.
Run the Flask Application
```
python src/app.py
```
Access the Application
- Web Interface: http://localhost:5000
- Default Admin Credentials:
  - Username: admin
  - Password: admin123

System Architecture

System Overview Diagram

Backend

Flask: Web framework for the application
LangChain: Orchestrates the RAG pipeline
ChromaDB: Vector database for document storage and retrieval
Ollama: Local LLM inference using Llama models
SQLite: Database for chat history and user management

Docker Configuration

The system is fully dockerized for easy deployment across platforms:

Persistent Volumes:
- chat_db: Stores SQLite database for chat history and user data
- chroma_db: Stores vector embeddings (3.5GB+)
- ollama_models: Stores downloaded LLM models
Ports:
- 5001: Web interface (Flask)
- 11434: Ollama API
Cross-Platform Support:
- Works on Linux, macOS, and Windows with Docker installed
- All paths are relative and compatible across operating systems

Using the Dockerized Application

Building the Image

docker-compose build

Running the Container

docker-compose up

Running in Background

docker-compose up -d

Stopping the Container

docker-compose down

Viewing Logs

docker-compose logs -f

Development Workflow

If you're contributing to the project, follow these guidelines:

1. Setup Development Environment

Clone the repository and set up a local Python environment as described above
Make sure you have Ollama installed locally for testing

2. Make Changes

Create a feature branch for your changes
Test thoroughly before committing
Update documentation as needed

3. Build and Test Docker Image

Test your changes with Docker to ensure cross-platform compatibility
Verify that volumes are properly persisted

4. Submit Pull Request

Include detailed description of changes
Reference any related issues

Troubleshooting

Common Issues

Port Conflicts: If port 5001 or 11434 is already in use, modify the port mapping in docker-compose.yml
Vector Database Building: The first run will take time to build the vector database. Subsequent runs will be faster.
Model Download: The first run will download the Llama model (~8GB), which may take time depending on your internet connection.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

The LangChain team for their excellent RAG framework
The Ollama project for making local LLM inference accessible
All contributors to the project

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
data_sources		data_sources
src		src
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.local		Dockerfile.local
README.md		README.md
docker-compose.local.yml		docker-compose.local.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.local.sh		docker-entrypoint.local.sh
docker-entrypoint.sh		docker-entrypoint.sh
requirements.txt		requirements.txt
run_local.sh		run_local.sh
sync_languages.py		sync_languages.py
update_schema.py		update_schema.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Encyclopaedic Polyglot Machine (EPM)

About the Project

Tech Stack

Features

Installation Options

Option 1: Using Docker Hub (Recommended)

Option 2: Local Python Environment

System Architecture

System Overview Diagram

Backend

Docker Configuration

Using the Dockerized Application

Building the Image

Running the Container

Running in Background

Stopping the Container

Viewing Logs

Development Workflow

1. Setup Development Environment

2. Make Changes

3. Build and Test Docker Image

4. Submit Pull Request

Troubleshooting

Common Issues

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Encyclopaedic Polyglot Machine (EPM)

About the Project

Tech Stack

Features

Installation Options

Option 1: Using Docker Hub (Recommended)

Option 2: Local Python Environment

System Architecture

System Overview Diagram

Backend

Docker Configuration

Using the Dockerized Application

Building the Image

Running the Container

Running in Background

Stopping the Container

Viewing Logs

Development Workflow

1. Setup Development Environment

2. Make Changes

3. Build and Test Docker Image

4. Submit Pull Request

Troubleshooting

Common Issues

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages