Skip to content

Jitenderkumar2030/exostack

Repository files navigation

ExoStack - Production-Ready Distributed AI Orchestration Platform

License: MIT Python 3.8+ Docker Kubernetes

ExoStack is an enterprise-grade distributed AI orchestration platform that enables seamless deployment, management, and scaling of AI models across multiple nodes. Built for production workloads, it provides intelligent scheduling, comprehensive monitoring, and multi-tenant isolation for AI inference at scale.

πŸš€ Key Features

Core Platform

  • πŸ—οΈ Distributed Architecture: Horizontally scalable AI inference across multiple nodes
  • 🧠 Model Registry: Centralized model management with auto-loading from HuggingFace, local files, and remote URLs
  • ⚑ GPU Agent Detection: Automatic detection and intelligent routing to GPU-enabled nodes
  • πŸ“‘ Streaming Inference: Real-time inference via Server-Sent Events (SSE) and WebSocket
  • πŸ”§ Fine-tuning Integration: Built-in support for model fine-tuning with popular frameworks

Production-Ready Features

  • πŸ’Ύ Persistent Task Storage: PostgreSQL/SQLite backend with comprehensive task tracking and metrics
  • 🎯 Model-Aware Scheduling: Intelligent node selection based on model compatibility and performance history
  • πŸ“¦ Advanced Model Packaging: Multi-source model support with caching, preloading, and validation
  • πŸ“Š Web Dashboard: Modern React-based dashboard with real-time monitoring and analytics
  • πŸ”’ Worker Pool & Container Execution: Multi-tenant isolation via containers, processes, and thread pools
  • 🚨 Alerts & Monitoring: Comprehensive alerting system with email, webhook, and Slack notifications
  • 🌍 Global Deployment: Kubernetes-ready with multi-region support and automated deployment

Enterprise Features

  • πŸ“ˆ Real-time Monitoring: Live performance metrics, resource utilization, and health monitoring
  • πŸ”„ Load Balancing: Intelligent request distribution with multiple scheduling strategies
  • πŸ›‘οΈ Security & Isolation: Container-based task isolation with resource limits and security controls
  • πŸ“± Multi-Channel Alerts: Email, Slack, Discord, webhook, and SMS notification support
  • πŸŽ›οΈ Threshold Monitoring: CPU, memory, GPU, and custom metric threshold alerting
  • πŸ“‹ Task Queue Management: Persistent task queues with priority scheduling and failure handling

πŸ—οΈ Architecture

ExoStack uses a hub-and-spoke architecture optimized for production deployments:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Web Dashboard β”‚    β”‚   ExoStack Hub   β”‚    β”‚  Alert Manager  β”‚
β”‚   (React UI)    │◄──►│  (Coordinator)   │◄──►│  (Monitoring)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚           β”‚           β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”
            β”‚ Agent    β”‚ β”‚ Agent   β”‚ β”‚ Agent   β”‚
            β”‚ (GPU)    β”‚ β”‚ (CPU)   β”‚ β”‚ (Edge)  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  • 🎯 Hub: Central coordinator managing task distribution, node registration, and model registry
  • πŸ€– Agent: Worker nodes executing AI inference with support for CPU, GPU, and specialized hardware
  • πŸ“Š Dashboard: Real-time web interface for monitoring, management, and analytics
  • 🚨 Alert Manager: Comprehensive monitoring and notification system
  • πŸ’Ύ Database: Persistent storage for tasks, metrics, and system state

πŸš€ Quick Start

Option 1: One-Command Deployment (Recommended)

# Clone the repository
git clone https://github.com/yourusername/exostack.git
cd exostack

# Deploy with Docker Compose (easiest)
./deployment/install.sh --mode docker-compose

# Or deploy to Kubernetes
./deployment/install.sh --mode kubernetes --domain your-domain.com

# Or standalone deployment
./deployment/install.sh --mode standalone

Option 2: Manual Setup

Prerequisites

  • Python 3.8+
  • PostgreSQL 12+ (or SQLite for development)
  • Redis 6+ (for caching and pub/sub)
  • Docker (optional, for containerized deployment)
  • CUDA-compatible GPU (optional, for GPU acceleration)

Installation

  1. Clone and setup:
git clone https://github.com/yourusername/exostack.git
cd exostack
pip install -r requirements.txt
  1. Configure environment:
cp config/.env.example config/.env
# Edit config/.env with your settings
  1. Initialize database:
# For PostgreSQL
export DATABASE_URL="postgresql://user:pass@localhost:5432/exostack"
python -m alembic upgrade head

# For SQLite (development)
export DATABASE_URL="sqlite:///./exostack.db"
python -m alembic upgrade head
  1. Start services:
# Terminal 1: Start Hub
python -m exo_hub.main

# Terminal 2: Start Agent
python -m exo_agent.main

# Terminal 3: Start Dashboard (optional)
cd web_dashboard && npm install && npm start

πŸ“Š Web Dashboard

Access the modern React-based dashboard at http://localhost:3000:

  • πŸ“ˆ Live Metrics: Real-time task execution, node health, and performance charts
  • πŸŽ›οΈ Task Queue: Monitor pending, running, and completed tasks with filtering
  • πŸ–₯️ Node Health: Interactive heatmap of node resources and status
  • πŸ“‹ Task History: Detailed execution history with performance analytics
  • πŸ”§ Model Registry: Manage and monitor loaded models across nodes
  • βš™οΈ Settings: Configure thresholds, alerts, and system parameters

πŸ”§ Configuration

Environment Variables

# Database Configuration
DATABASE_URL=postgresql://user:pass@localhost:5432/exostack
REDIS_URL=redis://localhost:6379/0

# Service Configuration
HUB_HOST=0.0.0.0
HUB_PORT=8000
AGENT_PORT=8001
DASHBOARD_PORT=3000

# Features
ENABLE_GPU=true
ENABLE_MONITORING=true
ENABLE_ALERTS=true
ENABLE_CONTAINERS=true

# Performance
MAX_CONCURRENT_TASKS=10
TASK_TIMEOUT=300
MODEL_CACHE_SIZE=10GB
HEARTBEAT_INTERVAL=30

# Security
JWT_SECRET=your-secret-key
ENABLE_AUTH=false

# Logging
LOG_LEVEL=INFO
LOG_FORMAT=json

Alert Configuration

Configure monitoring thresholds in shared/config/alert_config.json:

{
  "threshold_rules": [
    {
      "rule_id": "cpu_usage_default",
      "name": "CPU Usage Monitor",
      "threshold_type": "cpu_usage",
      "warning_threshold": 80.0,
      "critical_threshold": 95.0,
      "notification_channels": ["email", "slack"]
    }
  ],
  "notification_configs": {
    "email": {
      "smtp_host": "smtp.gmail.com",
      "smtp_port": 587,
      "email_from": "alerts@yourcompany.com",
      "email_to": ["admin@yourcompany.com"]
    },
    "slack": {
      "slack_webhook_url": "https://hooks.slack.com/services/..."
    }
  }
}

πŸ“š API Reference

Hub Endpoints

Task Management

# Submit inference task
POST /api/tasks
{
  "model_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
  "input_data": {"prompt": "Hello, world!"},
  "priority": 1
}

# Get task status
GET /api/tasks/{task_id}

# List tasks with filtering
GET /api/tasks?status=running&limit=10

# Cancel task
DELETE /api/tasks/{task_id}

Node Management

# List registered nodes
GET /api/nodes

# Get node details
GET /api/nodes/{node_id}

# Update node configuration
PUT /api/nodes/{node_id}
{
  "max_concurrent_tasks": 5,
  "enabled": true
}

Model Registry

# List available models
GET /api/models

# Load model on nodes
POST /api/models/load
{
  "model_id": "microsoft/DialoGPT-medium",
  "source": "huggingface",
  "target_nodes": ["node-gpu-01"]
}

# Unload model
DELETE /api/models/{model_id}/nodes/{node_id}

Monitoring & Metrics

# Get system health
GET /api/health

# Get dashboard statistics
GET /api/stats/dashboard

# Get node metrics
GET /api/metrics/nodes/{node_id}

# Get task metrics
GET /api/metrics/tasks

Agent Endpoints

Health & Status

# Agent health check
GET /health

# Agent capabilities
GET /capabilities

# Resource usage
GET /metrics

Model Operations

# List loaded models
GET /models

# Load model
POST /models/load
{
  "model_id": "gpt2",
  "source": "huggingface"
}

# Run inference
POST /inference
{
  "model_id": "gpt2",
  "input_data": {"prompt": "Hello"},
  "parameters": {"max_tokens": 100}
}

πŸš€ Deployment Options

Docker Compose (Recommended for Development)

# Quick start with Docker Compose
./deployment/install.sh --mode docker-compose

# Custom configuration
./deployment/install.sh --mode docker-compose --domain localhost --no-gpu

Kubernetes (Production)

# Deploy to existing cluster
./deployment/install.sh --mode kubernetes --domain your-domain.com

# With custom namespace
./deployment/install.sh --mode kubernetes --namespace exostack-prod --domain api.yourcompany.com

Standalone (Development)

# Local development setup
./deployment/install.sh --mode standalone --no-monitoring

πŸ› οΈ Development

Running Tests

# Run all tests
pytest tests/

# Run specific test suite
pytest tests/test_hub.py
pytest tests/test_agent.py
pytest tests/test_integration.py

# Run with coverage
pytest --cov=exo_hub --cov=exo_agent tests/

# Run dashboard tests
cd web_dashboard && npm test

Building for Production

# Build Docker images
docker build -t exostack/hub:latest -f docker/Dockerfile.hub .
docker build -t exostack/agent:latest -f docker/Dockerfile.agent .
docker build -t exostack/dashboard:latest -f docker/Dockerfile.dashboard ./web_dashboard

# Build and push to registry
docker-compose -f docker-compose.prod.yml build
docker-compose -f docker-compose.prod.yml push

Local Development

# Start development environment
docker-compose -f docker-compose.dev.yml up -d

# Watch for changes (hot reload)
python -m exo_hub.main --reload
python -m exo_agent.main --reload

# Dashboard development server
cd web_dashboard && npm run dev

πŸ”§ Advanced Configuration

Worker Pool Configuration

# shared/config/worker_config.py
WORKER_POOL_CONFIG = {
    "max_thread_workers": 4,
    "max_process_workers": 2,
    "enable_containers": True,
    "container_image": "exostack/inference:latest",
    "resource_limits": {
        "memory_mb": 2048,
        "cpu_cores": 2,
        "timeout_seconds": 300
    }
}

Model Registry Configuration

# shared/config/model_config.py
MODEL_REGISTRY_CONFIG = {
    "cache_dir": "/app/models_cache",
    "max_cache_size_gb": 50,
    "preload_models": [
        "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        "microsoft/DialoGPT-medium"
    ],
    "sources": {
        "huggingface": {"enabled": True},
        "local": {"enabled": True, "base_path": "/models"},
        "remote": {"enabled": True, "timeout": 300}
    }
}

Scheduler Configuration

# shared/config/scheduler_config.py
SCHEDULER_CONFIG = {
    "strategy": "balanced",  # performance, resource, balanced
    "weights": {
        "model_compatibility": 0.4,
        "resource_availability": 0.3,
        "performance_history": 0.3
    },
    "cache_ttl_seconds": 300,
    "max_retries": 3
}

πŸ“Š Monitoring & Observability

Metrics Collection

ExoStack automatically collects comprehensive metrics:

  • Task Metrics: Execution time, success rate, queue length
  • Node Metrics: CPU, memory, GPU utilization, disk usage
  • Model Metrics: Load time, inference latency, memory usage
  • System Metrics: Request rate, error rate, response time

Grafana Dashboard

Import the provided Grafana dashboard for visualization:

# Import dashboard
curl -X POST http://grafana:3000/api/dashboards/db \
  -H "Content-Type: application/json" \
  -d @monitoring/grafana/exostack-dashboard.json

Prometheus Integration

ExoStack exposes metrics in Prometheus format:

# prometheus.yml
scrape_configs:
  - job_name: 'exostack-hub'
    static_configs:
      - targets: ['exostack-hub:8000']
    metrics_path: '/metrics'

  - job_name: 'exostack-agents'
    static_configs:
      - targets: ['exostack-agent:8001']
    metrics_path: '/metrics'

πŸ”’ Security

Authentication & Authorization

# Enable authentication
export ENABLE_AUTH=true
export JWT_SECRET=your-super-secret-key

# Create API key
curl -X POST http://localhost:8000/api/auth/keys \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{"name": "production-key", "permissions": ["read", "write"]}'

Container Security

# Kubernetes security context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true

πŸš€ Production Deployment

High Availability Setup

# Deploy with HA configuration
./deployment/install.sh --mode kubernetes \
  --domain api.yourcompany.com \
  --replicas 3 \
  --enable-monitoring \
  --enable-backup

Backup & Recovery

# Database backup
kubectl exec -n exostack postgres-0 -- pg_dump -U exostack exostack > backup.sql

# Model cache backup
kubectl exec -n exostack exostack-hub-0 -- tar -czf - /app/models_cache > models_backup.tar.gz

# Restore from backup
kubectl exec -i -n exostack postgres-0 -- psql -U exostack exostack < backup.sql

Scaling

# Scale agents
kubectl scale deployment exostack-agent-cpu --replicas=5 -n exostack

# Scale hub (if stateless)
kubectl scale deployment exostack-hub --replicas=3 -n exostack

# Auto-scaling with HPA
kubectl autoscale deployment exostack-agent-cpu --cpu-percent=70 --min=2 --max=10 -n exostack

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

  1. Fork and clone:
git clone https://github.com/yourusername/exostack.git
cd exostack
  1. Setup development environment:
python -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
pre-commit install
  1. Run tests:
pytest tests/
cd web_dashboard && npm test
  1. Submit PR:
git checkout -b feature/amazing-feature
git commit -m 'Add amazing feature'
git push origin feature/amazing-feature

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • HuggingFace for the transformers library and model hub
  • FastAPI for the excellent web framework
  • React and Tailwind CSS for the modern UI
  • PostgreSQL and Redis for reliable data storage
  • Docker and Kubernetes for containerization and orchestration
  • The open-source community for inspiration and contributions

πŸ“ž Support & Community

πŸ”„ Changelog

See CHANGELOG.md for detailed version history and release notes.


Built with ❀️ for the AI community

About

**ExoStack** is an open-source hybrid AI orchestration platform designed to connect, register, and distribute ML/LLM workloads across a decentralized cluster of edge, cloud, and local GPU devices. Inspired by Exolabs and Dstack. ExoStack combines the best of peer-based compute

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors