ExoStack is an enterprise-grade distributed AI orchestration platform that enables seamless deployment, management, and scaling of AI models across multiple nodes. Built for production workloads, it provides intelligent scheduling, comprehensive monitoring, and multi-tenant isolation for AI inference at scale.
- ποΈ Distributed Architecture: Horizontally scalable AI inference across multiple nodes
- π§ Model Registry: Centralized model management with auto-loading from HuggingFace, local files, and remote URLs
- β‘ GPU Agent Detection: Automatic detection and intelligent routing to GPU-enabled nodes
- π‘ Streaming Inference: Real-time inference via Server-Sent Events (SSE) and WebSocket
- π§ Fine-tuning Integration: Built-in support for model fine-tuning with popular frameworks
- πΎ Persistent Task Storage: PostgreSQL/SQLite backend with comprehensive task tracking and metrics
- π― Model-Aware Scheduling: Intelligent node selection based on model compatibility and performance history
- π¦ Advanced Model Packaging: Multi-source model support with caching, preloading, and validation
- π Web Dashboard: Modern React-based dashboard with real-time monitoring and analytics
- π Worker Pool & Container Execution: Multi-tenant isolation via containers, processes, and thread pools
- π¨ Alerts & Monitoring: Comprehensive alerting system with email, webhook, and Slack notifications
- π Global Deployment: Kubernetes-ready with multi-region support and automated deployment
- π Real-time Monitoring: Live performance metrics, resource utilization, and health monitoring
- π Load Balancing: Intelligent request distribution with multiple scheduling strategies
- π‘οΈ Security & Isolation: Container-based task isolation with resource limits and security controls
- π± Multi-Channel Alerts: Email, Slack, Discord, webhook, and SMS notification support
- ποΈ Threshold Monitoring: CPU, memory, GPU, and custom metric threshold alerting
- π Task Queue Management: Persistent task queues with priority scheduling and failure handling
ExoStack uses a hub-and-spoke architecture optimized for production deployments:
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Web Dashboard β β ExoStack Hub β β Alert Manager β
β (React UI) βββββΊβ (Coordinator) βββββΊβ (Monitoring) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββΌββββββββββββ
β β β
βββββββββΌβββ ββββββββΌβββ ββββββββΌβββ
β Agent β β Agent β β Agent β
β (GPU) β β (CPU) β β (Edge) β
ββββββββββββ βββββββββββ βββββββββββ
- π― Hub: Central coordinator managing task distribution, node registration, and model registry
- π€ Agent: Worker nodes executing AI inference with support for CPU, GPU, and specialized hardware
- π Dashboard: Real-time web interface for monitoring, management, and analytics
- π¨ Alert Manager: Comprehensive monitoring and notification system
- πΎ Database: Persistent storage for tasks, metrics, and system state
# Clone the repository
git clone https://github.com/yourusername/exostack.git
cd exostack
# Deploy with Docker Compose (easiest)
./deployment/install.sh --mode docker-compose
# Or deploy to Kubernetes
./deployment/install.sh --mode kubernetes --domain your-domain.com
# Or standalone deployment
./deployment/install.sh --mode standalone- Python 3.8+
- PostgreSQL 12+ (or SQLite for development)
- Redis 6+ (for caching and pub/sub)
- Docker (optional, for containerized deployment)
- CUDA-compatible GPU (optional, for GPU acceleration)
- Clone and setup:
git clone https://github.com/yourusername/exostack.git
cd exostack
pip install -r requirements.txt- Configure environment:
cp config/.env.example config/.env
# Edit config/.env with your settings- Initialize database:
# For PostgreSQL
export DATABASE_URL="postgresql://user:pass@localhost:5432/exostack"
python -m alembic upgrade head
# For SQLite (development)
export DATABASE_URL="sqlite:///./exostack.db"
python -m alembic upgrade head- Start services:
# Terminal 1: Start Hub
python -m exo_hub.main
# Terminal 2: Start Agent
python -m exo_agent.main
# Terminal 3: Start Dashboard (optional)
cd web_dashboard && npm install && npm startAccess the modern React-based dashboard at http://localhost:3000:
- π Live Metrics: Real-time task execution, node health, and performance charts
- ποΈ Task Queue: Monitor pending, running, and completed tasks with filtering
- π₯οΈ Node Health: Interactive heatmap of node resources and status
- π Task History: Detailed execution history with performance analytics
- π§ Model Registry: Manage and monitor loaded models across nodes
- βοΈ Settings: Configure thresholds, alerts, and system parameters
# Database Configuration
DATABASE_URL=postgresql://user:pass@localhost:5432/exostack
REDIS_URL=redis://localhost:6379/0
# Service Configuration
HUB_HOST=0.0.0.0
HUB_PORT=8000
AGENT_PORT=8001
DASHBOARD_PORT=3000
# Features
ENABLE_GPU=true
ENABLE_MONITORING=true
ENABLE_ALERTS=true
ENABLE_CONTAINERS=true
# Performance
MAX_CONCURRENT_TASKS=10
TASK_TIMEOUT=300
MODEL_CACHE_SIZE=10GB
HEARTBEAT_INTERVAL=30
# Security
JWT_SECRET=your-secret-key
ENABLE_AUTH=false
# Logging
LOG_LEVEL=INFO
LOG_FORMAT=jsonConfigure monitoring thresholds in shared/config/alert_config.json:
{
"threshold_rules": [
{
"rule_id": "cpu_usage_default",
"name": "CPU Usage Monitor",
"threshold_type": "cpu_usage",
"warning_threshold": 80.0,
"critical_threshold": 95.0,
"notification_channels": ["email", "slack"]
}
],
"notification_configs": {
"email": {
"smtp_host": "smtp.gmail.com",
"smtp_port": 587,
"email_from": "alerts@yourcompany.com",
"email_to": ["admin@yourcompany.com"]
},
"slack": {
"slack_webhook_url": "https://hooks.slack.com/services/..."
}
}
}# Submit inference task
POST /api/tasks
{
"model_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"input_data": {"prompt": "Hello, world!"},
"priority": 1
}
# Get task status
GET /api/tasks/{task_id}
# List tasks with filtering
GET /api/tasks?status=running&limit=10
# Cancel task
DELETE /api/tasks/{task_id}# List registered nodes
GET /api/nodes
# Get node details
GET /api/nodes/{node_id}
# Update node configuration
PUT /api/nodes/{node_id}
{
"max_concurrent_tasks": 5,
"enabled": true
}# List available models
GET /api/models
# Load model on nodes
POST /api/models/load
{
"model_id": "microsoft/DialoGPT-medium",
"source": "huggingface",
"target_nodes": ["node-gpu-01"]
}
# Unload model
DELETE /api/models/{model_id}/nodes/{node_id}# Get system health
GET /api/health
# Get dashboard statistics
GET /api/stats/dashboard
# Get node metrics
GET /api/metrics/nodes/{node_id}
# Get task metrics
GET /api/metrics/tasks# Agent health check
GET /health
# Agent capabilities
GET /capabilities
# Resource usage
GET /metrics# List loaded models
GET /models
# Load model
POST /models/load
{
"model_id": "gpt2",
"source": "huggingface"
}
# Run inference
POST /inference
{
"model_id": "gpt2",
"input_data": {"prompt": "Hello"},
"parameters": {"max_tokens": 100}
}# Quick start with Docker Compose
./deployment/install.sh --mode docker-compose
# Custom configuration
./deployment/install.sh --mode docker-compose --domain localhost --no-gpu# Deploy to existing cluster
./deployment/install.sh --mode kubernetes --domain your-domain.com
# With custom namespace
./deployment/install.sh --mode kubernetes --namespace exostack-prod --domain api.yourcompany.com# Local development setup
./deployment/install.sh --mode standalone --no-monitoring# Run all tests
pytest tests/
# Run specific test suite
pytest tests/test_hub.py
pytest tests/test_agent.py
pytest tests/test_integration.py
# Run with coverage
pytest --cov=exo_hub --cov=exo_agent tests/
# Run dashboard tests
cd web_dashboard && npm test# Build Docker images
docker build -t exostack/hub:latest -f docker/Dockerfile.hub .
docker build -t exostack/agent:latest -f docker/Dockerfile.agent .
docker build -t exostack/dashboard:latest -f docker/Dockerfile.dashboard ./web_dashboard
# Build and push to registry
docker-compose -f docker-compose.prod.yml build
docker-compose -f docker-compose.prod.yml push# Start development environment
docker-compose -f docker-compose.dev.yml up -d
# Watch for changes (hot reload)
python -m exo_hub.main --reload
python -m exo_agent.main --reload
# Dashboard development server
cd web_dashboard && npm run dev# shared/config/worker_config.py
WORKER_POOL_CONFIG = {
"max_thread_workers": 4,
"max_process_workers": 2,
"enable_containers": True,
"container_image": "exostack/inference:latest",
"resource_limits": {
"memory_mb": 2048,
"cpu_cores": 2,
"timeout_seconds": 300
}
}# shared/config/model_config.py
MODEL_REGISTRY_CONFIG = {
"cache_dir": "/app/models_cache",
"max_cache_size_gb": 50,
"preload_models": [
"TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"microsoft/DialoGPT-medium"
],
"sources": {
"huggingface": {"enabled": True},
"local": {"enabled": True, "base_path": "/models"},
"remote": {"enabled": True, "timeout": 300}
}
}# shared/config/scheduler_config.py
SCHEDULER_CONFIG = {
"strategy": "balanced", # performance, resource, balanced
"weights": {
"model_compatibility": 0.4,
"resource_availability": 0.3,
"performance_history": 0.3
},
"cache_ttl_seconds": 300,
"max_retries": 3
}ExoStack automatically collects comprehensive metrics:
- Task Metrics: Execution time, success rate, queue length
- Node Metrics: CPU, memory, GPU utilization, disk usage
- Model Metrics: Load time, inference latency, memory usage
- System Metrics: Request rate, error rate, response time
Import the provided Grafana dashboard for visualization:
# Import dashboard
curl -X POST http://grafana:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d @monitoring/grafana/exostack-dashboard.jsonExoStack exposes metrics in Prometheus format:
# prometheus.yml
scrape_configs:
- job_name: 'exostack-hub'
static_configs:
- targets: ['exostack-hub:8000']
metrics_path: '/metrics'
- job_name: 'exostack-agents'
static_configs:
- targets: ['exostack-agent:8001']
metrics_path: '/metrics'# Enable authentication
export ENABLE_AUTH=true
export JWT_SECRET=your-super-secret-key
# Create API key
curl -X POST http://localhost:8000/api/auth/keys \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"name": "production-key", "permissions": ["read", "write"]}'# Kubernetes security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true# Deploy with HA configuration
./deployment/install.sh --mode kubernetes \
--domain api.yourcompany.com \
--replicas 3 \
--enable-monitoring \
--enable-backup# Database backup
kubectl exec -n exostack postgres-0 -- pg_dump -U exostack exostack > backup.sql
# Model cache backup
kubectl exec -n exostack exostack-hub-0 -- tar -czf - /app/models_cache > models_backup.tar.gz
# Restore from backup
kubectl exec -i -n exostack postgres-0 -- psql -U exostack exostack < backup.sql# Scale agents
kubectl scale deployment exostack-agent-cpu --replicas=5 -n exostack
# Scale hub (if stateless)
kubectl scale deployment exostack-hub --replicas=3 -n exostack
# Auto-scaling with HPA
kubectl autoscale deployment exostack-agent-cpu --cpu-percent=70 --min=2 --max=10 -n exostackWe welcome contributions! Please see our Contributing Guide for details.
- Fork and clone:
git clone https://github.com/yourusername/exostack.git
cd exostack- Setup development environment:
python -m venv venv
source venv/bin/activate
pip install -r requirements-dev.txt
pre-commit install- Run tests:
pytest tests/
cd web_dashboard && npm test- Submit PR:
git checkout -b feature/amazing-feature
git commit -m 'Add amazing feature'
git push origin feature/amazing-featureThis project is licensed under the MIT License - see the LICENSE file for details.
- HuggingFace for the transformers library and model hub
- FastAPI for the excellent web framework
- React and Tailwind CSS for the modern UI
- PostgreSQL and Redis for reliable data storage
- Docker and Kubernetes for containerization and orchestration
- The open-source community for inspiration and contributions
- π GitHub Issues: Report bugs and request features
- π Documentation: Comprehensive docs and tutorials
- π¬ Discord Community: Join our community
- π§ Email Support: support@exostack.dev
See CHANGELOG.md for detailed version history and release notes.
Built with β€οΈ for the AI community