Production-ready real-time speaker diarization system with hybrid Python/Rust architecture, optimized for <100ms end-to-end latency streaming.
VoiceFlow Intelligence Platform is a sophisticated audio processing system that identifies "who speaks when" in audio streams. It combines:
- Python ML Service: Model training, ONNX export, batch processing
- Rust Inference Engine: Real-time WebSocket streaming with ultra-low latency
- MLOps Pipeline: Complete CI/CD, monitoring, containerization
✅ Ultra-Low Latency - 4.48ms P99 model inference, 40-80ms P99 end-to-end
✅ Real-Time Streaming - WebSocket audio streaming with <100ms P99 latency target
✅ Batch Processing - Asynchronous processing of audio files via REST API
✅ Model Management - Training, versioning, A/B testing, hot-reload
✅ Production-Ready - Docker Compose, Prometheus metrics, Grafana dashboards
✅ High Performance - Optimized ONNX Runtime with 297 req/s throughput (CPU)
✅ Scalable - Stateless services, horizontal scaling support
┌─────────────┐
│ Client │
└──────┬──────┘
│
├─── HTTP/REST ────────► Python ML Service (Port 8000)
│ ├─ FastAPI
│ ├─ PyTorch Training
│ ├─ ONNX Export
│ └─ Batch Jobs
│
└─── WebSocket ──────────► Rust Inference Engine (Port 3000)
├─ Axum/Tokio
├─ ONNX Runtime
├─ WebSocket Server
└─ Real-Time Processing
┌──────────────────┐ ┌──────────────┐
│ PostgreSQL │ │ Redis │
│ (Metadata, │ │ (Cache, │
│ Audit Logs) │ │ Rate Limit)│
└──────────────────┘ └──────────────┘
┌──────────────────┐ ┌──────────────┐
│ Prometheus │───►│ Grafana │
│ (Metrics) │ │ (Dashboard) │
└──────────────────┘ └──────────────┘
For detailed architecture diagrams, see CONCEPTION_TECHNIQUE.md.
- Docker 20.10+ & Docker Compose 2.0+
- Python 3.11+ (for local development)
- Rust 1.75+ (for local development)
- Git
git clone https://github.com/YOUR_USERNAME/VoiceFlow-Intelligence-Platform.git
cd VoiceFlow-Intelligence-Platformdocker-compose up --buildThis starts:
- Python ML Service (http://localhost:8000)
- Rust Inference Engine (http://localhost:3000)
- PostgreSQL (port 5432)
- Redis (port 6379)
- Prometheus (http://localhost:9090)
- Grafana (http://localhost:3001)
# Check Python service
curl http://localhost:8000/health
# Check Rust service
curl http://localhost:3000/health
# Access API documentation
start http://localhost:8000/docs# streaming_test.py
import asyncio
import websockets
import json
async def test_streaming():
async with websockets.connect("ws://localhost:3000/ws/stream") as ws:
# Send audio chunk (dummy data for demo)
message = {
"type": "audio_chunk",
"data": [0.1] * 16000, # 1 second @ 16kHz
"sequence": 0
}
await ws.send(json.dumps(message))
# Receive result
result = await ws.recv()
print(f"Result: {result}")
asyncio.run(test_streaming())VoiceFlow-Intelligence-Platform/
├── docs/ # 📚 Documentation
│ ├── CAHIER_DES_CHARGES.md # Requirements specification
│ ├── NOTE_DE_CADRAGE.md # Project planning
│ ├── CONCEPTION_TECHNIQUE.md # Technical architecture
│ └── ARCHITECTURE_FLOW.md # Data flow diagrams
│
├── voiceflow-ml/ # 🐍 Python ML Service
│ ├── api/ # FastAPI routes
│ │ ├── main.py # Application entry point
│ │ └── routes/ # API endpoints
│ ├── models/ # ML models
│ │ ├── diarization/ # Speaker diarization model
│ │ └── preprocessing/ # Audio feature extraction
│ ├── services/ # Business logic layer
│ ├── repositories/ # Data access layer
│ ├── core/ # Configuration & utilities
│ ├── tests/ # Unit & integration tests
│ ├── requirements.txt # Python dependencies
│ ├── Dockerfile # Container image
│ └── .env.example # Environment variables template
│
├── voiceflow-inference/ # 🦀 Rust Inference Engine
│ ├── src/
│ │ ├── main.rs # Entry point
│ │ ├── api/ # HTTP API handlers
│ │ ├── inference/ # ONNX Runtime integration
│ │ ├── streaming/ # WebSocket handlers
│ │ └── metrics/ # Prometheus metrics
│ ├── tests/ # Rust tests
│ ├── Cargo.toml # Rust dependencies
│ └── Dockerfile # Multi-stage build
│
├── models/ # 📦 Shared ONNX models
├── data/ # 🗂️ Training datasets (gitignored)
├── docker-compose.yml # 🐳 Full stack orchestration
├── prometheus.yml # 📊 Metrics configuration
└── README.md # 📖 This file
cd voiceflow-ml
# Create virtual environment
python -m venv venv
.\venv\Scripts\activate # Windows PowerShell
# Install dependencies
pip install -r requirements.txt
# Copy environment variables
cp .env.example .env
# Edit .env with your settings
# Run development server
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000cd voiceflow-inference
# Build project
cargo build --release
# Run tests
cargo test --verbose
# Run service
cargo run --releasePython:
cd voiceflow-ml
pytest tests/ --cov=. --cov-report=htmlRust:
cd voiceflow-inference
cargo test --verbose
cargo clippy -- -D warningsInteractive API Docs: http://localhost:8000/docs
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| GET | /ready |
Readiness probe (checks DB & Redis) |
| POST | /api/models/train |
Train new model |
| POST | /api/models/{id}/export |
Export model to ONNX |
| GET | /api/models |
List all models |
| PUT | /api/models/{id}/activate |
Activate model for production |
| POST | /api/inference/batch |
Submit batch job |
| GET | /api/inference/batch/{job_id} |
Get job status/results |
curl -X POST http://localhost:8000/api/models/train \
-H "Content-Type: application/json" \
-d '{
"dataset_path": "/data/librispeech.tar.gz",
"version": "1.3.0",
"hyperparameters": {
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 50
}
}'| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /infer |
Synchronous inference |
| WebSocket | /ws/stream |
Real-time streaming |
| GET | /metrics |
Prometheus metrics |
const ws = new WebSocket('ws://localhost:3000/ws/stream');
ws.onopen = () => {
// Send audio chunk
ws.send(JSON.stringify({
type: 'audio_chunk',
data: [...audioSamples], // Float32Array @ 16kHz
sequence: 0
}));
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Diarization:', result.segments);
console.log('Latency:', result.latency_ms, 'ms');
};Access Prometheus: http://localhost:9090
Key Metrics:
inference_latency_seconds- Inference latency histograminference_requests_total- Total inference requestsinference_errors_total- Total errorswebsocket_connections_active- Active WebSocket connections
Access Grafana: http://localhost:3001 (admin/admin)
Pre-configured dashboards:
- Real-Time Inference - Latency (P50/P99), throughput, error rate
- Model Performance - Accuracy trends, version comparison
- System Health - CPU, memory, network metrics
All API endpoints (except /health) require JWT authentication:
# Get token (implement your auth endpoint)
TOKEN=$(curl -X POST http://localhost:8000/auth/login \
-d '{"username":"user","password":"pass"}' | jq -r '.access_token')
# Use token
curl http://localhost:8000/api/models \
-H "Authorization: Bearer $TOKEN"Default: 100 requests/minute per user (configurable via Redis)
- Audio file size: Max 100 MB
- Audio duration: Max 30 minutes
- File format validation (magic bytes check)
# Build and start
docker-compose up --build -d
# View logs
docker-compose logs -f
# Stop all services
docker-compose down
# Stop and remove volumes
docker-compose down -vHelm charts and Kubernetes manifests in progress.
| Metric | CPU (Optimized ONNX) | GPU T4 (Projected) |
|---|---|---|
| P99 Latency | 4.48ms ✅ | 3-5ms ✅ |
| Median Latency | 3.36ms | 1-2ms |
| Throughput | 297 req/s | 500-800 req/s |
| Model Size | 10 MB | 10 MB |
| Metric | Target | Actual | Status |
|---|---|---|---|
| End-to-End P99 | < 100ms | 40-80ms | ✅ |
| Model Inference P99 | < 10ms | 4.48ms | ✅ |
| Rust Overhead | < 10ms | ~5-8ms | ✅ |
| Throughput (CPU) | > 100 req/s | 297 req/s | ✅ |
| Memory (Rust) | < 500 MB | 200-300 MB | ✅ |
| Concurrent WebSocket | > 500 | 1000+ | ✅ |
Tested on: Intel/AMD 4-core CPU, 8GB RAM, no GPU
End-to-end includes: network (10-40ms) + Rust processing (~5-8ms) + model inference (4.48ms)
See docs/PERFORMANCE_ANALYSIS.md for detailed benchmarks
# Python tests
cd voiceflow-ml
pytest tests/ -v --cov=. --cov-report=term-missing
# Rust tests
cd voiceflow-inference
cargo test --verbose
# Integration tests
docker-compose -f docker-compose.test.yml up --abort-on-container-exit# HTTP load test (wrk)
wrk -t4 -c100 -d30s http://localhost:8000/api/models
# WebSocket load test (custom script)
python tests/load/websocket_load.py --clients 1000 --duration 60| Document | Description |
|---|---|
| CAHIER_DES_CHARGES.md | Requirements, features, acceptance criteria |
| NOTE_DE_CADRAGE.md | 2-day development plan, milestones |
| CONCEPTION_TECHNIQUE.md | Architecture, data models, API specs |
| ARCHITECTURE_FLOW.md | Request flows, sequence diagrams |
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Python:
black,flake8,mypy - Rust:
cargo fmt,cargo clippy - Tests: Coverage > 80%
This project is licensed under the MIT License - see the LICENSE file for details.
- PyTorch - Deep learning framework
- ONNX - Model interoperability
- Rust - Systems programming language
- FastAPI - Modern Python web framework
- Axum - Ergonomic Rust web framework
Project Maintainer: VoiceFlow Intelligence Team
- GitHub: @FCHEHIDI
- Issues: GitHub Issues
- Advanced speaker embeddings (ResNet + LSTM)
- GPU acceleration (CUDA)
- gRPC Python ↔ Rust communication
- Kubernetes deployment
- Advanced A/B testing (multi-armed bandit)
- Model distillation (INT8 quantization)
- Multi-language support
- Real-time transcription integration
- Cloud deployment (AWS/Azure/GCP)
⭐ Star this repo if you find it useful!
Made with ❤️ by the VoiceFlow Intelligence Team
