Skip to content

earchibald/yoto-smart-stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

743 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yoto Smart Stream

Tests Coverage Python License

A service to stream audio to Yoto devices, monitor events via MQTT, and manage interactive audio experiences with a web UI. Includes support for "Choose Your Own Adventure" style interactive stories.

Status: ✅ Production server implemented and ready for deployment! (See Details)

🚀 Quick Start

New to streaming from your own service? Check out our Quick Start: Streaming Guide to get up and running in 10 minutes!

For general project setup: See our Quick Start Guide

🎯 Features

  • Audio Streaming: Stream custom audio content to Yoto players
  • Dynamic Playlists: Server-controlled playlists that stream multiple files sequentially (details)
  • Real-time Monitoring: Track player events via MQTT (play/pause, button presses, battery status)
  • Interactive Cards: Create Choose Your Own Adventure style experiences using physical button controls
  • Progressive Web App (PWA): Install on mobile devices for an app-like experience (Mobile Guide)
  • Web UI: Manage your audio library, configure cards, and write interactive scripts
  • Card Management: Upload, organize, and configure custom Yoto cards
  • Multi-format Support: Automatic audio conversion to Yoto-compatible formats
  • Speech-to-Text Transcription: Automatic transcription of audio files using OpenAI Whisper
  • Automatic Token Refresh: Background task keeps OAuth tokens valid indefinitely (details)

🏗️ Architecture

graph TB
    subgraph "Client Layer"
        WebUI[Web Browser / UI]
        PWAMobile[📱 PWA Mobile App<br/><i>installed on device</i>]
    end

    subgraph "Application Layer"
        API[FastAPI Server<br/>REST API]
        MQTT_Handler[MQTT Event Handler<br/>Real-time Events]
        IconService[Icon Service<br/>Image Management]
    end

    subgraph "Core Services"
        AudioMgr[Audio Manager<br/>Upload & Conversion]
        ScriptEngine[Script Engine<br/>CYOA Logic]
        CardMgr[Card Manager<br/>MYO Cards]
    end

    subgraph "Data Layer"
        DB[(SQLite/PostgreSQL<br/>Cards, Scripts, Metadata)]
        FileStorage[File Storage<br/>Audio Files & Icons]
    end

    subgraph "External Services"
        YotoAPI[Yoto REST API<br/>yoto.dev]
        YotoMQTT[Yoto MQTT Broker<br/>mqtt.yoto.io]
    end

    subgraph "Yoto Devices"
        YotoPlayer1[Yoto Player<br/>Living Room]
        YotoMini[Yoto Mini<br/>Bedroom]
    end

    WebUI -->|HTTP/WebSocket| API
    PWAMobile -->|HTTP/WebSocket| API

    API --> AudioMgr
    API --> ScriptEngine
    API --> CardMgr
    API --> IconService
    API -->|Subscribe/Publish| MQTT_Handler

    AudioMgr --> FileStorage
    ScriptEngine --> DB
    CardMgr --> DB
    IconService --> FileStorage
    MQTT_Handler --> DB

    API -->|Create Cards<br/>Control Players| YotoAPI
    MQTT_Handler <-->|Subscribe to Events<br/>Device Status| YotoMQTT

    YotoAPI -->|Commands| YotoPlayer1
    YotoAPI -->|Commands| YotoMini
    YotoMQTT <-->|Events & Status| YotoPlayer1
    YotoMQTT <-->|Events & Status| YotoMini

    YotoPlayer1 -->|Stream Audio| FileStorage
    YotoMini -->|Stream Audio| FileStorage

    style WebUI fill:#e1f5ff
    style API fill:#fff3e0
    style MQTT_Handler fill:#fff3e0
    style YotoAPI fill:#f3e5f5
    style YotoMQTT fill:#f3e5f5
    style YotoPlayer1 fill:#e8f5e9
    style YotoMini fill:#e8f5e9
    style DB fill:#fce4ec
    style FileStorage fill:#fce4ec
Loading

Key Components

  • FastAPI Server: REST API for managing cards, audio, and device control
  • MQTT Handler: Real-time event processing from Yoto devices (button presses, playback status)
  • Icon Service: Manages 16x16 display icons for Yoto Mini devices
  • Audio Manager: Handles file uploads, format conversion, and streaming
  • Script Engine: Executes interactive "Choose Your Own Adventure" logic
  • Card Manager: Creates and manages Yoto MYO (Make Your Own) cards

Data Flow

  1. User → Web UI → API: Manage cards, upload audio, configure scripts
  2. API → Yoto API: Create cards, control playback, send commands
  3. Yoto Devices → MQTT: Real-time events (button presses, status updates)
  4. MQTT → Event Handler → API → Web UI: Real-time updates displayed to users
  5. Yoto Devices → File Storage: Stream audio content directly from your server

For detailed architecture information, see Architecture Documentation.

📖 Chapter Book User Flow (Yoto Player Perspective)

This diagram illustrates how a Yoto player interacts with a straightforward chapter book card streaming from your own service:

sequenceDiagram
    participant User as Child/User
    participant Player as Yoto Player
    participant MQTT as MQTT Broker<br/>(Yoto Cloud)
    participant Server as Your Streaming<br/>Server

    Note over User,Server: Card Insertion & Initial Setup
    User->>Player: Inserts chapter book card
    activate Player
    Player->>MQTT: Publish: Card inserted event
    MQTT->>Server: Forward: Card inserted
    Server->>Server: Load card metadata<br/>(3 chapters)

    Note over User,Server: Chapter 1 Playback Begins
    Player->>Server: HTTP GET: /audio/chapter1.mp3
    Server-->>Player: Stream audio (Chapter 1)
    Player->>Player: Begin playback
    Player->>MQTT: Publish: Playback started<br/>Chapter 1
    MQTT->>Server: Forward: Status update

    Note over User,Server: Real-time Status Updates
    loop Every few seconds during playback
        Player->>MQTT: Publish: Track position,<br/>battery level, volume
        MQTT->>Server: Forward: Status updates
    end

    Note over User,Server: User Navigation - Skip to Chapter 2
    User->>Player: Presses "Next Chapter" button
    Player->>MQTT: Publish: Button press event<br/>(next chapter)
    MQTT->>Server: Forward: Button event
    Player->>Player: Stop Chapter 1
    Player->>Server: HTTP GET: /audio/chapter2.mp3
    Server-->>Player: Stream audio (Chapter 2)
    Player->>Player: Begin playback
    Player->>MQTT: Publish: Now playing<br/>Chapter 2

    Note over User,Server: Pause & Resume
    User->>Player: Presses "Pause" button
    Player->>Player: Pause playback
    Player->>MQTT: Publish: Playback paused
    MQTT->>Server: Forward: Paused status

    User->>Player: Presses "Play" button
    Player->>Player: Resume playback
    Player->>MQTT: Publish: Playback resumed

    Note over User,Server: Chapter 2 Completes
    Player->>Player: Chapter 2 ends
    Player->>MQTT: Publish: Chapter complete
    Player->>Server: HTTP GET: /audio/chapter3.mp3
    Server-->>Player: Stream audio (Chapter 3)
    Player->>MQTT: Publish: Now playing<br/>Chapter 3

    Note over User,Server: Book Completion
    Player->>Player: Chapter 3 ends
    Player->>MQTT: Publish: Playback complete<br/>All chapters finished
    MQTT->>Server: Forward: Complete event
    Player->>Player: Return to idle state
    deactivate Player

    Note over User,Server: Card Removed
    User->>Player: Removes card
    Player->>MQTT: Publish: Card removed
    MQTT->>Server: Forward: Card removed
Loading

Key Points:

  • Player-Initiated Streaming: The Yoto player directly requests audio from your server via HTTP
  • MQTT for Events: All user interactions and status updates flow through Yoto's MQTT broker
  • Sequential Chapters: Chapters play in order and can be navigated using physical buttons
  • Real-time Monitoring: Your server receives live updates about playback state, position, and battery
  • Simple HTTP: Audio streaming uses standard HTTP GET requests - no special protocols needed

For interactive (Choose Your Own Adventure) stories where chapter selection depends on button choices, see the Interactive Cards section.

📋 Prerequisites

  • Python 3.9 or higher
  • A Yoto player and Yoto account
  • Yoto API client ID (get from yoto.dev)

🚀 Quick Start

1. Clone the Repository

git clone https://github.com/earchibald/yoto-smart-stream.git
cd yoto-smart-stream

2. Set Up Environment

# Copy environment template
cp .env.example .env

# Edit .env and add your Yoto client ID
# YOTO_CLIENT_ID=your_client_id_here

3. Install Dependencies

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

4. Authenticate with Yoto API

# Run the simple client example to authenticate
python examples/simple_client.py

Follow the prompts to authenticate. Your refresh token will be saved for future use.

5. Start the Production Server

# Run the production server (recommended)
python -m yoto_smart_stream

# Or use uvicorn directly
uvicorn yoto_smart_stream.api:app --reload --port 8080

# Or run the example server (for learning/testing)
python examples/basic_server.py

Visit http://localhost:8080/docs for interactive API documentation.

📚 Documentation

Quick Start & Testing

Cloud Deployment

Creating Content

API & Implementation

  • Yoto API Reference: Complete API specification with endpoints, MQTT topics, and code examples
  • Yoto MQTT Reference: Deep dive into MQTT event service implementation and real-time communication

Architecture & Planning

Custom Skills

This project includes specialized AI skills for development workflows:

See Copilot Instructions for detailed skill usage.

🛠️ Development

Using GitHub Codespaces

This project is configured for GitHub Codespaces with a complete development environment:

  1. Click "Code" → "Create codespace on main"
  2. Wait for the environment to set up automatically
  3. Start developing!

Note: GitHub Copilot Workspace has network access configured to test Railway deployments directly. See Copilot Workspace Network Configuration for details.

Local Development

# Install development dependencies
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Run linter
ruff check .

# Format code
black .

☁️ Deployment

Railway.app

Deploy to Railway with automated CI/CD and environment-specific tokens:

# Staging: Automatic on push to develop branch
git push origin develop

# Development: Manual with coordination
# Via GitHub Actions: Railway Development (Shared Environment) workflow

# Pull Requests: Automatic via Railway native PR Environments
# Open a PR → Railway creates pr-{number} environment automatically

Persistent Storage: Railway volumes are configured to persist Yoto OAuth tokens across deployments and restarts. Tokens are stored in /data/.yoto_refresh_token on Railway, ensuring authentication survives instance restarts.

PR Environments: Railway automatically creates ephemeral environments for pull requests with zero configuration. See Railway PR Environments Guide.

Static Environments: Uses pre-registered callback URLs for Yoto OAuth compatibility.

Token Security: Production uses a single Railway token (RAILWAY_TOKEN_PROD). Application secrets like YOTO_CLIENT_ID are stored as Railway Shared Variables. See GitHub Secrets Setup.

Status:

  • ✅ Production (main branch) - Auto-deployed with RAILWAY_TOKEN_PROD
  • ✅ PR Environments (all PRs) - Auto-created by Railway native feature, inherits secrets via Shared Variables

Resources:

Health Check

All deployments include a health check endpoint:

curl https://your-app.up.railway.app/health

📖 Examples

Basic Player Control

from yoto_api import YotoManager

# Initialize and authenticate
ym = YotoManager(client_id="your_client_id")
ym.set_refresh_token("your_refresh_token")
ym.check_and_refresh_token()

# Get players
ym.update_player_status()
for player_id, player in ym.players.items():
    print(f"{player.name}: {'Online' if player.online else 'Offline'}")

# Control a player
ym.pause_player(player_id)
ym.play_player(player_id)
ym.set_volume(player_id, 10)

Listen to MQTT Events

python examples/mqtt_listener.py

Start API Server

python examples/basic_server.py

Then use the API:

# List players
curl http://localhost:8080/api/players

# Control a player
curl -X POST http://localhost:8080/api/players/{player_id}/control \
  -H "Content-Type: application/json" \
  -d '{"action": "pause"}'

🎨 Creating Custom MYO Cards

Create your own custom audio cards for Yoto players with two approaches:

Option 1: Stream from Your Own Service (Recommended)

Point cards to your own streaming URL for complete control:

# Create card that streams from your server
card_data = {
    "title": "My Streaming Story",
    "content": {
        "chapters": [{
            "tracks": [{
                "url": "https://your-server.com/audio/story.mp3"  # YOUR server!
            }]
        }]
    }
}

Benefits:

  • ✅ Update audio without recreating cards
  • ✅ Dynamic content (time-based, personalized, etc.)
  • ✅ No file size limits
  • ✅ Complete control over content

Quick Start: See Streaming from Your Own Service

Option 2: Upload to Yoto's Servers (Traditional)

Upload audio files to Yoto's storage:

from yoto_api import YotoManager

# Authenticate
ym = YotoManager(client_id="your_client_id")
ym.set_refresh_token("your_refresh_token")

# Create a custom card with your audio
# 1. Calculate file hash
# 2. Get upload URL
# 3. Upload audio file
# 4. Create card with metadata
# 5. Play on device

Complete Guide: See Creating MYO Cards for detailed instructions including:

  • Audio file preparation and upload
  • Cover image creation
  • Multi-chapter card creation
  • Complete Python code examples
  • Troubleshooting tips

🎮 Interactive Cards (Choose Your Own Adventure)

Create interactive stories that respond to button presses:

{
  "card_id": "adventure-001",
  "chapters": {
    "1": {
      "audio_file_id": "intro.mp3",
      "choices": {
        "left": {"next_chapter": 2},
        "right": {"next_chapter": 3}
      }
    },
    "2": {
      "audio_file_id": "left-path.mp3",
      "choices": {
        "left": {"next_chapter": 4},
        "right": {"next_chapter": 5}
      }
    }
  }
}

See Architecture Guide for detailed implementation.

🏗️ Project Structure

yoto-smart-stream/
├── .devcontainer/          # GitHub Codespaces configuration
├── .github/
│   ├── skills/            # Custom AI skills for Railway, Yoto API, and testing
│   └── workflows/         # GitHub Actions CI/CD
├── docs/                  # Documentation
├── examples/              # Example scripts
│   ├── simple_client.py   # Basic API usage
│   ├── mqtt_listener.py   # Event monitoring
│   └── basic_server.py    # FastAPI server
├── yoto_smart_stream/     # Main package (to be implemented)
├── tests/                 # Test suite
├── pyproject.toml         # Project configuration
├── requirements.txt       # Dependencies
└── README.md             # This file

🤝 Contributing

Contributions are welcome! Please read our contributing guidelines before submitting PRs.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests and linting
  5. Submit a pull request

Keeping Documentation Up to Date

When making significant architectural changes:

  1. Update the Architecture Diagram: Edit the Mermaid diagram in README.md (🏗️ Architecture section)
  2. Update Architecture Docs: Sync changes with docs/ARCHITECTURE.md for detailed explanations
  3. Key Areas to Check:
    • New services or components
    • Changed data flows
    • New external integrations
    • Modified APIs or protocols

The architectural diagram uses Mermaid syntax and renders automatically on GitHub. Test your changes locally with a Mermaid preview tool or GitHub's preview feature.

For detailed instructions, see the Architecture Diagram Maintenance Guide.

📝 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

  • yoto_api by cdnninja - Python wrapper for Yoto API
  • Yoto Play for creating an amazing audio player for kids
  • Community contributors and testers

⚠️ Disclaimer

This project is not affiliated with, endorsed by, or sponsored by Yoto Play. It's an independent community project built using publicly available APIs.

📞 Support

🗺️ Roadmap

  • Project setup and documentation
  • Basic API client examples
  • Icon management module (100% complete, 96% test coverage)
  • FastAPI server implementation with lifespan management
  • MQTT event monitoring
  • Comprehensive testing suite (137 tests passing)
  • Code quality tooling (ruff, black, pytest, mypy)
  • Quick start and testing guides
  • Dynamic audio streaming with queue management
  • Text-to-speech integration
  • Speech-to-text transcription (OpenAI Whisper)
  • Audio management system (basic upload/conversion)
  • Interactive script engine
  • Web UI for queue management
  • Queue persistence (database storage)
  • Cloud deployment guides
  • Progressive Web App (PWA) for mobile devices

📱 Progressive Web App (PWA)

Yoto Smart Stream is now available as a Progressive Web App! Install it on your mobile device for an app-like experience without going through an app store.

Features

  • Install on iOS/Android: Add to home screen for quick access
  • Offline Support: Core features work without internet connection
  • Native Feel: Full-screen app experience without browser UI
  • Auto Updates: Always get the latest version automatically
  • Mobile Optimized: Touch-friendly interface with 44px touch targets
  • Cross-Platform: Works on iOS, Android, and Desktop

Quick Installation

iOS (Safari):

  1. Visit your Yoto Smart Stream URL
  2. Tap Share → "Add to Home Screen"
  3. Launch from your home screen

Android (Chrome):

  1. Visit your Yoto Smart Stream URL
  2. Tap "Install" prompt or Menu → "Install app"
  3. Launch from your app drawer

For detailed installation instructions, troubleshooting, and technical details, see the PWA Mobile Installation Guide.

Speech-to-Text Transcription

Yoto Smart Stream supports automatic transcription of audio files using OpenAI Whisper. Transcription is disabled by default to keep container builds fast; enable it only when you need STT.

Features

  • Automatic Transcription: Audio files are automatically transcribed when uploaded (when enabled)
  • Manual Transcription: Trigger transcription on-demand for any audio file
  • Transcript Storage: Transcripts are stored in the database and associated with audio files
  • UI Integration: View, manage, and retry transcriptions from the Audio Library page
  • TTS Integration: Text-to-speech generated audio automatically stores the source text as transcript

Usage

Enabling transcription

  1. Install optional dependencies (not included by default to keep builds small):
pip install openai-whisper torch torchaudio
  1. Set the feature flag:
export TRANSCRIPTION_ENABLED=true
  1. (Optional) Choose a model:
export TRANSCRIPTION_MODEL=base  # tiny|base|small|medium|large
  1. Restart the app so the settings take effect.

Automatic Transcription

When you upload an audio file via the Audio Library page and transcription is enabled, it starts automatically:

  1. Navigate to the Audio Library page
  2. Upload or record an audio file
  3. The system will automatically transcribe the audio in the background
  4. Once complete, a "📝 Transcript" button appears next to the audio file

Manual Transcription

To manually trigger transcription for an existing audio file:

  1. Navigate to the Audio Library page
  2. Find the audio file you want to transcribe
  3. Click the "📝 Transcribe" button
  4. Wait for the transcription to complete
  5. Click "📝 Transcript" to view the result

Viewing Transcripts

  1. Click the "📝 Transcript" button next to any audio file with a completed transcription
  2. The transcript will appear in a modal dialog
  3. You can copy the text or close the modal

API Endpoints

Get Transcript

GET /api/audio/{filename}/transcript

Returns the transcript for a specific audio file.

Response:

{
  "filename": "my-audio.mp3",
  "transcript": "This is the transcribed text...",
  "status": "completed",
  "error": null,
  "transcribed_at": "2026-01-14T08:00:00Z"
}

Trigger Transcription

POST /api/audio/{filename}/transcribe

Manually trigger transcription for an audio file.

Response:

{
  "success": true,
  "filename": "my-audio.mp3",
  "status": "completed",
  "transcript_length": 1234,
  "message": "Transcription completed successfully"
}

Configuration

The transcription service uses the OpenAI Whisper "base" model by default, which provides a good balance between speed and accuracy. You can configure this in the transcription service if needed.

Model Options:

  • tiny - Fastest, lowest accuracy
  • base - Good balance (default)
  • small - Better accuracy, slower
  • medium - High accuracy, much slower
  • large - Best accuracy, very slow

Technical Details

  • Model: OpenAI Whisper (base)
  • Dependencies: openai-whisper, torch, torchaudio
  • Storage: SQLite database with transcript text, status, and metadata
  • Processing: Currently synchronous (TODO: move to background queue for production)

Limitations

  • Transcription currently runs synchronously and may cause request timeouts for very long audio files
  • For production deployments, consider implementing a background task queue (Celery, RQ, or FastAPI BackgroundTasks)
  • Whisper models require significant CPU/GPU resources for faster transcription

Made with ❤️ for the Yoto community

About

A service package to stream audio to a yoto device, monitor events from that device with MQTT and change what it is doing in response., with a web UI to configure, upload, record and manage audio and card scripts. Should support "Choose Your Own Adventure" style as one option.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors