This guide helps you resolve common issues with the YOLO Trainer Platform.
- Installation Issues
- Docker Issues
- Backend Issues
- Frontend Issues
- Database Issues
- Training Issues
- Inference Issues
- Performance Issues
Problem: docker: command not found
Solution:
# Install Docker
# For Ubuntu/Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# For macOS
brew install --cask docker
# For Windows
# Download from https://www.docker.com/products/docker-desktopProblem: docker-compose: command not found
Solution:
# Docker Compose is included in Docker Desktop
# Or install separately
sudo curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-composeProblem: permission denied while trying to connect to the Docker daemon
Solution:
# Add your user to docker group
sudo usermod -aG docker $USER
# Log out and back in, or run
newgrp dockerProblem: Services fail to start with docker-compose up
Diagnosis:
# Check logs
docker-compose logs
# Check specific service
docker-compose logs backend
docker-compose logs postgresCommon Solutions:
- Port already in use:
# Find what's using the port
sudo lsof -i :8000
sudo lsof -i :3000
sudo lsof -i :5432
# Kill the process or change port in docker-compose.yml- Insufficient resources:
# Check Docker resources
docker info
# Increase Docker Desktop resources:
# Settings → Resources → Advanced
# Increase CPU and Memory- Network issues:
# Remove old networks
docker network prune
# Recreate containers
docker-compose down
docker-compose up -dProblem: Backend can't connect to PostgreSQL
Solution:
# Check if PostgreSQL is running
docker-compose ps postgres
# Check PostgreSQL logs
docker-compose logs postgres
# Verify connection string in backend/.env
DATABASE_URL=postgresql://yolouser:yolopass@postgres:5432/yolodb
# Wait for database to be ready
docker-compose up -d postgres
sleep 10
docker-compose up -d backendProblem: Permission denied when writing to volumes
Solution:
# Fix permissions
sudo chown -R $USER:$USER ./uploads ./models ./datasets
# Or recreate volumes
docker-compose down -v
docker-compose up -dProblem: ModuleNotFoundError: No module named 'app'
Solution:
# Ensure you're in the backend directory
cd backend
# Reinstall dependencies
pip install -r requirements.txt
# Check PYTHONPATH
export PYTHONPATH="${PYTHONPATH}:${PWD}"
# Or use uvicorn with correct path
uvicorn app.main:app --reloadProblem: Tables don't exist or schema errors
Solution:
# Tables are auto-created on startup
# But if issues persist:
# Install alembic
pip install alembic
# Create migration
alembic revision --autogenerate -m "Initial migration"
# Apply migration
alembic upgrade head
# Or drop and recreate
docker-compose down -v
docker-compose up -d postgres
# Wait 10 seconds
docker-compose up -d backendProblem: 401 Unauthorized or token issues
Solution:
# Check SECRET_KEY is set in backend/.env
SECRET_KEY=your-secret-key-change-in-production
# Regenerate token
# Login again to get new token
# Check token expiration
ACCESS_TOKEN_EXPIRE_MINUTES=30
# Verify Authorization header format
Authorization: Bearer <token>Problem: Images won't upload or 413 error
Solution:
# Check file size
MAX_UPLOAD_SIZE=104857600 # 100MB in backend/.env
# Check disk space
df -h
# Check permissions
chmod 755 uploads/ datasets/ models/
# Check file type
# Only JPEG, PNG supportedProblem: Dependencies won't install
Solution:
cd frontend
# Clear cache
npm cache clean --force
# Delete node_modules and package-lock.json
rm -rf node_modules package-lock.json
# Reinstall
npm install
# Try with legacy peer deps
npm install --legacy-peer-depsProblem: Frontend pages return 404
Solution:
# Check Next.js is running
npm run dev
# Check port
# Frontend should be on http://localhost:3000
# Clear .next cache
rm -rf .next
npm run devProblem: Frontend can't connect to backend
Solution:
# Check backend is running
curl http://localhost:8000/health
# Check .env.local
NEXT_PUBLIC_API_URL=http://localhost:8000
# Check CORS settings in backend
# Should allow your frontend origin
# Check browser console for errors
# Open DevTools → ConsoleProblem: npm run build fails
Solution:
# Fix TypeScript errors
npm run type-check
# Fix linting errors
npm run lint
# Clear cache and rebuild
rm -rf .next
npm run buildProblem: Can't connect to PostgreSQL
Solution:
# Check PostgreSQL is running
docker-compose ps postgres
# Test connection
docker-compose exec postgres psql -U yolouser -d yolodb
# Check credentials in .env
DATABASE_URL=postgresql://yolouser:yolopass@postgres:5432/yolodb
# Restart PostgreSQL
docker-compose restart postgresProblem: Database operations are slow
Solution:
-- Add indexes (already included in models)
-- Check query performance
EXPLAIN ANALYZE SELECT * FROM datasets;
-- Vacuum database
VACUUM ANALYZE;Problem: No space left on device
Solution:
# Check disk usage
df -h
# Clean Docker volumes
docker system prune -a --volumes
# Backup and clean old dataProblem: Training job stays in PENDING
Diagnosis:
# Check training job logs
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/training/1/logs
# Check backend logs
docker-compose logs -f backendCommon Solutions:
- No labeled images:
# Ensure dataset has labeled images
# Check dataset statistics
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/datasets/1/statistics- Insufficient memory:
# Reduce batch size
{
"batch_size": 8, # Instead of 16
"epochs": 50
}- CUDA/GPU issues:
# Check GPU availability
nvidia-smi
# Force CPU training
# Set in training_service.py:
device = 'cpu'Problem: Training takes too long
Solution:
# Use smaller model
model_type: "yolov8n" # Fastest
# Reduce image size
img_size: 416 # Instead of 640
# Use GPU
# Ensure CUDA is available
# Reduce dataset size for testing
epochs: 10Problem: CUDA out of memory or RAM exhausted
Solution:
# Reduce batch size
batch_size: 4 # or even 1
# Use smaller model
model_type: "yolov8n"
# Close other applications
# Increase Docker memory
# Docker Desktop → Settings → Resources → MemoryProblem: Training job fails with error
Diagnosis:
# Check error message in job
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/training/1
# Check backend logs
docker-compose logs backend | grep ERRORCommon Solutions:
- Invalid annotations:
# Check annotation format
# Values must be between 0 and 1
# x_center, y_center, width, height- Missing images:
# Ensure all images exist
# Check dataset directory
ls datasets/1/images/- Disk space:
df -h
# Clean up old training runs
rm -rf models/*/train_*Problem: Can't run inference on model
Solution:
# Check model exists
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/models/1
# Check model file exists
ls models/1/train_*/weights/best.pt
# Ensure model is deployed
curl -X POST -H "Authorization: Bearer <token>" \
http://localhost:8000/api/v1/models/1/deployProblem: Predictions take too long
Solution:
# Use smaller model
# yolov8n is fastest
# Use GPU if available
# Check CUDA setup
# Reduce image size
# Resize before uploading
# Enable model caching
# Already implemented in codeProblem: Model detects wrong objects
Solution:
# Train with more data
# Add more labeled images
# Train for more epochs
epochs: 200
# Check class balance
# Ensure all classes have enough examples
# Adjust confidence threshold
confidence: 0.5 # Higher = fewer detectionsProblem: API requests take too long
Diagnosis:
# Check response times
time curl http://localhost:8000/api/v1/datasets/
# Check database query performance
# Enable SQL logging in config.py
echo_pool=TrueSolutions:
- Add database indexes (already included)
- Enable Redis caching:
# In config.py
REDIS_URL=redis://localhost:6379- Optimize queries:
# Use eager loading
.options(joinedload(Dataset.images))- Add pagination:
# Use skip and limit
curl "http://localhost:8000/api/v1/datasets/?skip=0&limit=10"Problem: System running out of memory
Solution:
# Check memory usage
docker stats
# Reduce worker processes
# In docker-compose.yml
command: uvicorn app.main:app --workers 2
# Close unused applications
# Restart containers
docker-compose restartProblem: CPU at 100%
Solution:
# Check what's using CPU
docker stats
# Reduce concurrent operations
# Limit batch size in training
# Use async operations
# Already implemented in FastAPI
# Scale horizontally
# Add more backend instancesIf you can't resolve your issue:
- Check logs:
# All services
docker-compose logs
# Specific service
docker-compose logs backend
docker-compose logs frontend- Check documentation:
- Search for errors:
- Copy error message
- Search in GitHub issues
- Search online
- Create an issue:
- Go to GitHub repository
- Click "Issues" → "New Issue"
- Provide:
- Error message
- Steps to reproduce
- System information
- Logs
- Community support:
- Check GitHub Discussions
- Ask on Stack Overflow with tag
yolo-trainer
Enable debug mode for more information:
# In backend/app/main.py
app = FastAPI(debug=True)
# Or set environment variable
DEBUG=True uvicorn app.main:app --reload// Check browser console (F12)
// Enable verbose logging
console.log(process.env.NODE_ENV)# Run container interactively
docker-compose run backend /bin/bash
# Check container internals
docker-compose exec backend ls -la
docker-compose exec backend env- Regular backups:
# Backup database
docker-compose exec postgres pg_dump -U yolouser yolodb > backup.sql
# Backup files
tar -czf backup.tar.gz uploads/ models/ datasets/- Monitor resources:
# Check disk space regularly
df -h
# Monitor Docker
docker stats- Keep updated:
# Update dependencies
cd backend && pip install -r requirements.txt --upgrade
cd frontend && npm update
# Update Docker images
docker-compose pull
docker-compose up -d- Clean up regularly:
# Remove old containers
docker system prune
# Clean old training runs
rm -rf models/*/train_old_*
# Vacuum database
docker-compose exec postgres psql -U yolouser -d yolodb -c "VACUUM ANALYZE;"- Docker Documentation: https://docs.docker.com/
- FastAPI Documentation: https://fastapi.tiangolo.com/
- PostgreSQL Documentation: https://www.postgresql.org/docs/
- Ultralytics YOLO: https://docs.ultralytics.com/
- Next.js Documentation: https://nextjs.org/docs
Still having issues? Open an issue on GitHub with:
- Detailed description
- Error messages
- System information
- Steps to reproduce