A service to stream audio to Yoto devices, monitor events via MQTT, and manage interactive audio experiences with a web UI. Includes support for "Choose Your Own Adventure" style interactive stories.
Status: ✅ Production server implemented and ready for deployment! (See Details)
New to streaming from your own service? Check out our Quick Start: Streaming Guide to get up and running in 10 minutes!
For general project setup: See our Quick Start Guide
- Audio Streaming: Stream custom audio content to Yoto players
- Dynamic Playlists: Server-controlled playlists that stream multiple files sequentially (details)
- Real-time Monitoring: Track player events via MQTT (play/pause, button presses, battery status)
- Interactive Cards: Create Choose Your Own Adventure style experiences using physical button controls
- Progressive Web App (PWA): Install on mobile devices for an app-like experience (Mobile Guide)
- Web UI: Manage your audio library, configure cards, and write interactive scripts
- Card Management: Upload, organize, and configure custom Yoto cards
- Multi-format Support: Automatic audio conversion to Yoto-compatible formats
- Speech-to-Text Transcription: Automatic transcription of audio files using OpenAI Whisper
- Automatic Token Refresh: Background task keeps OAuth tokens valid indefinitely (details)
graph TB
subgraph "Client Layer"
WebUI[Web Browser / UI]
PWAMobile[📱 PWA Mobile App<br/><i>installed on device</i>]
end
subgraph "Application Layer"
API[FastAPI Server<br/>REST API]
MQTT_Handler[MQTT Event Handler<br/>Real-time Events]
IconService[Icon Service<br/>Image Management]
end
subgraph "Core Services"
AudioMgr[Audio Manager<br/>Upload & Conversion]
ScriptEngine[Script Engine<br/>CYOA Logic]
CardMgr[Card Manager<br/>MYO Cards]
end
subgraph "Data Layer"
DB[(SQLite/PostgreSQL<br/>Cards, Scripts, Metadata)]
FileStorage[File Storage<br/>Audio Files & Icons]
end
subgraph "External Services"
YotoAPI[Yoto REST API<br/>yoto.dev]
YotoMQTT[Yoto MQTT Broker<br/>mqtt.yoto.io]
end
subgraph "Yoto Devices"
YotoPlayer1[Yoto Player<br/>Living Room]
YotoMini[Yoto Mini<br/>Bedroom]
end
WebUI -->|HTTP/WebSocket| API
PWAMobile -->|HTTP/WebSocket| API
API --> AudioMgr
API --> ScriptEngine
API --> CardMgr
API --> IconService
API -->|Subscribe/Publish| MQTT_Handler
AudioMgr --> FileStorage
ScriptEngine --> DB
CardMgr --> DB
IconService --> FileStorage
MQTT_Handler --> DB
API -->|Create Cards<br/>Control Players| YotoAPI
MQTT_Handler <-->|Subscribe to Events<br/>Device Status| YotoMQTT
YotoAPI -->|Commands| YotoPlayer1
YotoAPI -->|Commands| YotoMini
YotoMQTT <-->|Events & Status| YotoPlayer1
YotoMQTT <-->|Events & Status| YotoMini
YotoPlayer1 -->|Stream Audio| FileStorage
YotoMini -->|Stream Audio| FileStorage
style WebUI fill:#e1f5ff
style API fill:#fff3e0
style MQTT_Handler fill:#fff3e0
style YotoAPI fill:#f3e5f5
style YotoMQTT fill:#f3e5f5
style YotoPlayer1 fill:#e8f5e9
style YotoMini fill:#e8f5e9
style DB fill:#fce4ec
style FileStorage fill:#fce4ec
- FastAPI Server: REST API for managing cards, audio, and device control
- MQTT Handler: Real-time event processing from Yoto devices (button presses, playback status)
- Icon Service: Manages 16x16 display icons for Yoto Mini devices
- Audio Manager: Handles file uploads, format conversion, and streaming
- Script Engine: Executes interactive "Choose Your Own Adventure" logic
- Card Manager: Creates and manages Yoto MYO (Make Your Own) cards
- User → Web UI → API: Manage cards, upload audio, configure scripts
- API → Yoto API: Create cards, control playback, send commands
- Yoto Devices → MQTT: Real-time events (button presses, status updates)
- MQTT → Event Handler → API → Web UI: Real-time updates displayed to users
- Yoto Devices → File Storage: Stream audio content directly from your server
For detailed architecture information, see Architecture Documentation.
This diagram illustrates how a Yoto player interacts with a straightforward chapter book card streaming from your own service:
sequenceDiagram
participant User as Child/User
participant Player as Yoto Player
participant MQTT as MQTT Broker<br/>(Yoto Cloud)
participant Server as Your Streaming<br/>Server
Note over User,Server: Card Insertion & Initial Setup
User->>Player: Inserts chapter book card
activate Player
Player->>MQTT: Publish: Card inserted event
MQTT->>Server: Forward: Card inserted
Server->>Server: Load card metadata<br/>(3 chapters)
Note over User,Server: Chapter 1 Playback Begins
Player->>Server: HTTP GET: /audio/chapter1.mp3
Server-->>Player: Stream audio (Chapter 1)
Player->>Player: Begin playback
Player->>MQTT: Publish: Playback started<br/>Chapter 1
MQTT->>Server: Forward: Status update
Note over User,Server: Real-time Status Updates
loop Every few seconds during playback
Player->>MQTT: Publish: Track position,<br/>battery level, volume
MQTT->>Server: Forward: Status updates
end
Note over User,Server: User Navigation - Skip to Chapter 2
User->>Player: Presses "Next Chapter" button
Player->>MQTT: Publish: Button press event<br/>(next chapter)
MQTT->>Server: Forward: Button event
Player->>Player: Stop Chapter 1
Player->>Server: HTTP GET: /audio/chapter2.mp3
Server-->>Player: Stream audio (Chapter 2)
Player->>Player: Begin playback
Player->>MQTT: Publish: Now playing<br/>Chapter 2
Note over User,Server: Pause & Resume
User->>Player: Presses "Pause" button
Player->>Player: Pause playback
Player->>MQTT: Publish: Playback paused
MQTT->>Server: Forward: Paused status
User->>Player: Presses "Play" button
Player->>Player: Resume playback
Player->>MQTT: Publish: Playback resumed
Note over User,Server: Chapter 2 Completes
Player->>Player: Chapter 2 ends
Player->>MQTT: Publish: Chapter complete
Player->>Server: HTTP GET: /audio/chapter3.mp3
Server-->>Player: Stream audio (Chapter 3)
Player->>MQTT: Publish: Now playing<br/>Chapter 3
Note over User,Server: Book Completion
Player->>Player: Chapter 3 ends
Player->>MQTT: Publish: Playback complete<br/>All chapters finished
MQTT->>Server: Forward: Complete event
Player->>Player: Return to idle state
deactivate Player
Note over User,Server: Card Removed
User->>Player: Removes card
Player->>MQTT: Publish: Card removed
MQTT->>Server: Forward: Card removed
Key Points:
- Player-Initiated Streaming: The Yoto player directly requests audio from your server via HTTP
- MQTT for Events: All user interactions and status updates flow through Yoto's MQTT broker
- Sequential Chapters: Chapters play in order and can be navigated using physical buttons
- Real-time Monitoring: Your server receives live updates about playback state, position, and battery
- Simple HTTP: Audio streaming uses standard HTTP GET requests - no special protocols needed
For interactive (Choose Your Own Adventure) stories where chapter selection depends on button choices, see the Interactive Cards section.
- Python 3.9 or higher
- A Yoto player and Yoto account
- Yoto API client ID (get from yoto.dev)
git clone https://github.com/earchibald/yoto-smart-stream.git
cd yoto-smart-stream# Copy environment template
cp .env.example .env
# Edit .env and add your Yoto client ID
# YOTO_CLIENT_ID=your_client_id_here# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Run the simple client example to authenticate
python examples/simple_client.pyFollow the prompts to authenticate. Your refresh token will be saved for future use.
# Run the production server (recommended)
python -m yoto_smart_stream
# Or use uvicorn directly
uvicorn yoto_smart_stream.api:app --reload --port 8080
# Or run the example server (for learning/testing)
python examples/basic_server.pyVisit http://localhost:8080/docs for interactive API documentation.
- Quick Start Guide: Get up and running in 10 minutes - from installation to working API
- Production Server Guide: Complete guide to the production server implementation
- Migration Guide: Migrate from examples to production server
- Testing Guide: Comprehensive testing instructions, coverage reports, and quality checks
- Railway Deployment Guide: Deploy to Railway.app with automated CI/CD
- Railway Direct Inspection: Direct API access for service inspection and troubleshooting (NEW)
- Railway PR Environments: Automatic ephemeral environments for pull requests
- Validating PR Environments: How to validate Railway PR environments are working correctly (NEW)
- Railway MCP Tool Validation: Railway MCP server setup and validation (NEW)
- Railway Shared Development: Coordinated access to shared dev environment
- Railway Token Setup: Configure separate tokens per environment
- Codespaces Railway Setup: Configure Railway access for GitHub Codespaces
- Copilot Workspace Configuration: Network access and Railway MCP server for GitHub Copilot Workspace (NEW)
- Cloud Agent Railway Tokens: Provision Railway tokens for Cloud Agents (GitHub Copilot Workspace) (NEW)
- Cloud Agent Quick Reference: Quick guide for enabling Cloud Agent Railway access (NEW)
- AWS Cost-Optimization Report: Complete AWS architecture analysis with cost breakdowns ($5-47/month options) (NEW)
- AWS Cost Quick Reference: Fast decision matrix for AWS deployment options (NEW)
- Streaming from Your Own Service: Stream audio from your server (recommended approach)
- Dynamic Audio Streaming: Create server-controlled playlists that stream multiple files sequentially (NEW)
- Creating MYO Cards: Traditional approach - upload audio to Yoto's servers
- Icon Management Guide: Working with display icons for Yoto Mini
- Speech-to-Text Transcription: Automatic transcription of audio files (NEW)
- Yoto API Reference: Complete API specification with endpoints, MQTT topics, and code examples
- Yoto MQTT Reference: Deep dive into MQTT event service implementation and real-time communication
- Architecture Guide: System design and implementation recommendations
- Planning Questions: Open questions and decision points
- Getting Started Guide: Step-by-step setup instructions
This project includes specialized AI skills for development workflows:
- Railway Service Management: Railway deployment and infrastructure management
- Yoto Smart Stream: Yoto API integration, audio streaming, and MQTT handling
- Yoto Testing: Comprehensive testing guide and automation
See Copilot Instructions for detailed skill usage.
This project is configured for GitHub Codespaces with a complete development environment:
- Click "Code" → "Create codespace on main"
- Wait for the environment to set up automatically
- Start developing!
Note: GitHub Copilot Workspace has network access configured to test Railway deployments directly. See Copilot Workspace Network Configuration for details.
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Run linter
ruff check .
# Format code
black .Deploy to Railway with automated CI/CD and environment-specific tokens:
# Staging: Automatic on push to develop branch
git push origin develop
# Development: Manual with coordination
# Via GitHub Actions: Railway Development (Shared Environment) workflow
# Pull Requests: Automatic via Railway native PR Environments
# Open a PR → Railway creates pr-{number} environment automaticallyPersistent Storage: Railway volumes are configured to persist Yoto OAuth tokens across deployments and restarts. Tokens are stored in /data/.yoto_refresh_token on Railway, ensuring authentication survives instance restarts.
PR Environments: Railway automatically creates ephemeral environments for pull requests with zero configuration. See Railway PR Environments Guide.
Static Environments: Uses pre-registered callback URLs for Yoto OAuth compatibility.
Token Security: Production uses a single Railway token (RAILWAY_TOKEN_PROD). Application secrets like YOTO_CLIENT_ID are stored as Railway Shared Variables. See GitHub Secrets Setup.
Status:
- ✅ Production (main branch) - Auto-deployed with
RAILWAY_TOKEN_PROD - ✅ PR Environments (all PRs) - Auto-created by Railway native feature, inherits secrets via Shared Variables
Resources:
- GitHub Secrets Setup - Configure deployment secrets
- PR Environments Guide - Automatic PR deployments
- Deployment Guide - Complete setup and deployment instructions
- Quick Reference - Common deployment commands
All deployments include a health check endpoint:
curl https://your-app.up.railway.app/healthfrom yoto_api import YotoManager
# Initialize and authenticate
ym = YotoManager(client_id="your_client_id")
ym.set_refresh_token("your_refresh_token")
ym.check_and_refresh_token()
# Get players
ym.update_player_status()
for player_id, player in ym.players.items():
print(f"{player.name}: {'Online' if player.online else 'Offline'}")
# Control a player
ym.pause_player(player_id)
ym.play_player(player_id)
ym.set_volume(player_id, 10)python examples/mqtt_listener.pypython examples/basic_server.pyThen use the API:
# List players
curl http://localhost:8080/api/players
# Control a player
curl -X POST http://localhost:8080/api/players/{player_id}/control \
-H "Content-Type: application/json" \
-d '{"action": "pause"}'Create your own custom audio cards for Yoto players with two approaches:
Point cards to your own streaming URL for complete control:
# Create card that streams from your server
card_data = {
"title": "My Streaming Story",
"content": {
"chapters": [{
"tracks": [{
"url": "https://your-server.com/audio/story.mp3" # YOUR server!
}]
}]
}
}Benefits:
- ✅ Update audio without recreating cards
- ✅ Dynamic content (time-based, personalized, etc.)
- ✅ No file size limits
- ✅ Complete control over content
Quick Start: See Streaming from Your Own Service
Upload audio files to Yoto's storage:
from yoto_api import YotoManager
# Authenticate
ym = YotoManager(client_id="your_client_id")
ym.set_refresh_token("your_refresh_token")
# Create a custom card with your audio
# 1. Calculate file hash
# 2. Get upload URL
# 3. Upload audio file
# 4. Create card with metadata
# 5. Play on deviceComplete Guide: See Creating MYO Cards for detailed instructions including:
- Audio file preparation and upload
- Cover image creation
- Multi-chapter card creation
- Complete Python code examples
- Troubleshooting tips
Create interactive stories that respond to button presses:
{
"card_id": "adventure-001",
"chapters": {
"1": {
"audio_file_id": "intro.mp3",
"choices": {
"left": {"next_chapter": 2},
"right": {"next_chapter": 3}
}
},
"2": {
"audio_file_id": "left-path.mp3",
"choices": {
"left": {"next_chapter": 4},
"right": {"next_chapter": 5}
}
}
}
}See Architecture Guide for detailed implementation.
yoto-smart-stream/
├── .devcontainer/ # GitHub Codespaces configuration
├── .github/
│ ├── skills/ # Custom AI skills for Railway, Yoto API, and testing
│ └── workflows/ # GitHub Actions CI/CD
├── docs/ # Documentation
├── examples/ # Example scripts
│ ├── simple_client.py # Basic API usage
│ ├── mqtt_listener.py # Event monitoring
│ └── basic_server.py # FastAPI server
├── yoto_smart_stream/ # Main package (to be implemented)
├── tests/ # Test suite
├── pyproject.toml # Project configuration
├── requirements.txt # Dependencies
└── README.md # This file
Contributions are welcome! Please read our contributing guidelines before submitting PRs.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and linting
- Submit a pull request
When making significant architectural changes:
- Update the Architecture Diagram: Edit the Mermaid diagram in
README.md(🏗️ Architecture section) - Update Architecture Docs: Sync changes with
docs/ARCHITECTURE.mdfor detailed explanations - Key Areas to Check:
- New services or components
- Changed data flows
- New external integrations
- Modified APIs or protocols
The architectural diagram uses Mermaid syntax and renders automatically on GitHub. Test your changes locally with a Mermaid preview tool or GitHub's preview feature.
For detailed instructions, see the Architecture Diagram Maintenance Guide.
MIT License - see LICENSE file for details
- yoto_api by cdnninja - Python wrapper for Yoto API
- Yoto Play for creating an amazing audio player for kids
- Community contributors and testers
This project is not affiliated with, endorsed by, or sponsored by Yoto Play. It's an independent community project built using publicly available APIs.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Yoto API: yoto.dev
- Project setup and documentation
- Basic API client examples
- Icon management module (100% complete, 96% test coverage)
- FastAPI server implementation with lifespan management
- MQTT event monitoring
- Comprehensive testing suite (137 tests passing)
- Code quality tooling (ruff, black, pytest, mypy)
- Quick start and testing guides
- Dynamic audio streaming with queue management
- Text-to-speech integration
- Speech-to-text transcription (OpenAI Whisper)
- Audio management system (basic upload/conversion)
- Interactive script engine
- Web UI for queue management
- Queue persistence (database storage)
- Cloud deployment guides
- Progressive Web App (PWA) for mobile devices
Yoto Smart Stream is now available as a Progressive Web App! Install it on your mobile device for an app-like experience without going through an app store.
- ✅ Install on iOS/Android: Add to home screen for quick access
- ✅ Offline Support: Core features work without internet connection
- ✅ Native Feel: Full-screen app experience without browser UI
- ✅ Auto Updates: Always get the latest version automatically
- ✅ Mobile Optimized: Touch-friendly interface with 44px touch targets
- ✅ Cross-Platform: Works on iOS, Android, and Desktop
iOS (Safari):
- Visit your Yoto Smart Stream URL
- Tap Share → "Add to Home Screen"
- Launch from your home screen
Android (Chrome):
- Visit your Yoto Smart Stream URL
- Tap "Install" prompt or Menu → "Install app"
- Launch from your app drawer
For detailed installation instructions, troubleshooting, and technical details, see the PWA Mobile Installation Guide.
Yoto Smart Stream supports automatic transcription of audio files using OpenAI Whisper. Transcription is disabled by default to keep container builds fast; enable it only when you need STT.
- Automatic Transcription: Audio files are automatically transcribed when uploaded (when enabled)
- Manual Transcription: Trigger transcription on-demand for any audio file
- Transcript Storage: Transcripts are stored in the database and associated with audio files
- UI Integration: View, manage, and retry transcriptions from the Audio Library page
- TTS Integration: Text-to-speech generated audio automatically stores the source text as transcript
- Install optional dependencies (not included by default to keep builds small):
pip install openai-whisper torch torchaudio- Set the feature flag:
export TRANSCRIPTION_ENABLED=true- (Optional) Choose a model:
export TRANSCRIPTION_MODEL=base # tiny|base|small|medium|large- Restart the app so the settings take effect.
When you upload an audio file via the Audio Library page and transcription is enabled, it starts automatically:
- Navigate to the Audio Library page
- Upload or record an audio file
- The system will automatically transcribe the audio in the background
- Once complete, a "📝 Transcript" button appears next to the audio file
To manually trigger transcription for an existing audio file:
- Navigate to the Audio Library page
- Find the audio file you want to transcribe
- Click the "📝 Transcribe" button
- Wait for the transcription to complete
- Click "📝 Transcript" to view the result
- Click the "📝 Transcript" button next to any audio file with a completed transcription
- The transcript will appear in a modal dialog
- You can copy the text or close the modal
GET /api/audio/{filename}/transcriptReturns the transcript for a specific audio file.
Response:
{
"filename": "my-audio.mp3",
"transcript": "This is the transcribed text...",
"status": "completed",
"error": null,
"transcribed_at": "2026-01-14T08:00:00Z"
}POST /api/audio/{filename}/transcribeManually trigger transcription for an audio file.
Response:
{
"success": true,
"filename": "my-audio.mp3",
"status": "completed",
"transcript_length": 1234,
"message": "Transcription completed successfully"
}The transcription service uses the OpenAI Whisper "base" model by default, which provides a good balance between speed and accuracy. You can configure this in the transcription service if needed.
Model Options:
tiny- Fastest, lowest accuracybase- Good balance (default)small- Better accuracy, slowermedium- High accuracy, much slowerlarge- Best accuracy, very slow
- Model: OpenAI Whisper (base)
- Dependencies:
openai-whisper,torch,torchaudio - Storage: SQLite database with transcript text, status, and metadata
- Processing: Currently synchronous (TODO: move to background queue for production)
- Transcription currently runs synchronously and may cause request timeouts for very long audio files
- For production deployments, consider implementing a background task queue (Celery, RQ, or FastAPI BackgroundTasks)
- Whisper models require significant CPU/GPU resources for faster transcription
Made with ❤️ for the Yoto community