diff --git a/docs/guides/claude-code-plugin.md b/docs/guides/claude-code-plugin.md new file mode 100644 index 0000000..9803ed9 --- /dev/null +++ b/docs/guides/claude-code-plugin.md @@ -0,0 +1,175 @@ +# Claude Code Plugin + +VoiceMode provides an official plugin for Claude Code that enables voice conversations directly within the CLI. + +## What the Plugin Provides + +The VoiceMode plugin includes: + +- **MCP Server** - Full voice capabilities via the `voicemode-mcp` server +- **Slash Commands** - Quick access to common operations +- **Skill File** - Documentation and usage patterns for Claude +- **Hooks** - Sound feedback during tool execution + +## Installation + +### From the Plugin Marketplace + +The plugin is published to the Claude Code plugin marketplace: + +```bash +# Add from marketplace +/plugin marketplace add mbailey/voicemode + +# Install the plugin +/plugin install voicemode +``` + +### From Local Development + +If you're developing or have VoiceMode cloned locally: + +```bash +# Add plugin from local path +/plugin marketplace add /path/to/voicemode/plugins/voicemode + +# Install the plugin +/plugin install voicemode +``` + +## Prerequisites + +The plugin requires VoiceMode services to be installed and running. After installing the plugin, use the install command: + +```bash +/voicemode:install +``` + +This runs the VoiceMode installer which sets up: + +- **Whisper.cpp** - Local speech-to-text +- **Kokoro** - Local text-to-speech +- **FFmpeg** - Audio processing (via Homebrew on macOS) + +Alternatively, install services manually: + +```bash +# Run the installer directly +curl -sL https://voicemode.ai/install.sh | bash + +# Or install individual services +voicemode whisper service install +voicemode kokoro install +``` + +## Slash Commands + +| Command | Description | +|---------|-------------| +| `/voicemode:install` | Install VoiceMode and dependencies | +| `/voicemode:converse` | Start a voice conversation | +| `/voicemode:status` | Check service status | +| `/voicemode:start` | Start voice services | +| `/voicemode:stop` | Stop voice services | + +### Starting a Conversation + +```bash +# Start with a greeting +/voicemode:converse Hello, how can I help you today? + +# Just start listening +/voicemode:converse +``` + +### Checking Status + +```bash +/voicemode:status +``` + +Shows whether Whisper (STT) and Kokoro (TTS) services are running and healthy. + +## MCP Tools + +Once installed, Claude has access to these MCP tools: + +- `mcp__voicemode__converse` - Speak and listen for responses +- `mcp__voicemode__service` - Manage voice services + +### Converse Tool Parameters + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `message` | (required) | Text for Claude to speak | +| `wait_for_response` | true | Listen for user response after speaking | +| `listen_duration_max` | 120 | Maximum recording time (seconds) | +| `voice` | auto | TTS voice name | +| `vad_aggressiveness` | 2 | Voice detection strictness (0-3) | + +## Hooks and Soundfonts + +The plugin includes a hook receiver that plays sounds during tool execution: + +- Sounds play when tools start and complete +- Provides audio feedback during long operations +- Uses configurable soundfonts + +Hooks are automatically configured when the plugin is installed. + +## Troubleshooting + +### Services Not Starting + +Check individual service status: + +```bash +voicemode whisper service status +voicemode kokoro service status +``` + +View logs: + +```bash +voicemode whisper service logs +voicemode kokoro service logs +``` + +### No Audio Output + +1. Ensure your system audio is working +2. Check that Kokoro service is running +3. Verify FFmpeg is installed: `which ffmpeg` + +### Speech Not Recognized + +1. Ensure Whisper service is running +2. Check microphone permissions for Terminal/Claude Code +3. Try speaking more clearly or adjusting VAD aggressiveness + +## Configuration + +VoiceMode respects configuration from `~/.voicemode/voicemode.env`: + +```bash +# Default TTS voice +VOICEMODE_TTS_VOICE=nova + +# Whisper model (base, small, medium, large) +VOICEMODE_WHISPER_MODEL=base + +# Override thread count for Whisper +VOICEMODE_WHISPER_THREADS=4 +``` + +Edit configuration: + +```bash +voicemode config edit +``` + +## Resources + +- [VoiceMode Documentation](https://voicemode.ai/docs) +- [GitHub Repository](https://github.com/mbailey/voicemode) +- [Plugin Source](https://github.com/mbailey/voicemode/tree/master/plugins/voicemode) diff --git a/plugins/voicemode/commands/converse.md b/plugins/voicemode/commands/converse.md index 2bcc218..f794c74 100644 --- a/plugins/voicemode/commands/converse.md +++ b/plugins/voicemode/commands/converse.md @@ -5,19 +5,10 @@ argument-hint: [message] # /voicemode:converse -Start an ongoing voice conversation with the user using the `mcp__voicemode__converse` tool. +Start an ongoing voice conversation with the user using the `voicemode:converse` MCP tool. -## Example +## Implementation -```json -{ - "message": "Hello! What would you like to work on?", - "wait_for_response": true -} -``` +Use the `voicemode:converse` tool with the user's message. All parameters have sensible defaults. -All other parameters have sensible defaults - just set the message. - -## Troubleshooting - -If voice services aren't working, load the `voicemode` skill for troubleshooting guidance and installation instructions. +If the tool call fails or you need more information about voice capabilities, load the `voicemode` skill for complete documentation. diff --git a/plugins/voicemode/commands/install.md b/plugins/voicemode/commands/install.md index 43a96be..c1c85c7 100644 --- a/plugins/voicemode/commands/install.md +++ b/plugins/voicemode/commands/install.md @@ -9,54 +9,10 @@ Install VoiceMode and its dependencies. ## Implementation -### Step 1: Install VoiceMode Package +Load the VoiceMode skill and follow the installation instructions in the "Install Services if Needed" section. -Run the installer with `--yes` flag (required for non-interactive environments like Claude Code): - -```bash -uvx voice-mode-install --yes -``` - -This installs the VoiceMode package and CLI. It does NOT install local speech services. - -### Step 2: Install Local Speech Services (Optional) - -After the VoiceMode package is installed, optionally install local STT and TTS services. - -**IMPORTANT**: These commands do NOT support a `--yes` flag - they are already non-interactive. - -Install Whisper for local speech-to-text: -```bash -voicemode whisper service install -``` - -Install Kokoro for local text-to-speech: -```bash -voicemode kokoro install -``` - -Both installations can take several minutes as they download models. - -### Step 3: Verify Installation - -After installation completes: - -1. Check service status: - ```bash - voicemode whisper service status - voicemode kokoro service status - ``` - -2. Start services if needed: - ```bash - voicemode whisper service start - voicemode kokoro service start - ``` - -## Notes - -- The `voice-mode-install` script requires `--yes` flag in non-interactive mode -- The `voicemode whisper/kokoro install` commands are already non-interactive -- Local services require ~2-3GB disk space for models -- Installation requires network access for downloads -- Services will be configured to start automatically on login +The skill contains complete guidance for: +- Installing the VoiceMode package +- Detecting Apple Silicon for local service recommendations +- Installing Whisper (STT) and Kokoro (TTS) with download size estimates +- Verifying installation and troubleshooting diff --git a/plugins/voicemode/skills/voicemode/SKILL.md b/plugins/voicemode/skills/voicemode/SKILL.md index 7b364ec..d54d8af 100644 --- a/plugins/voicemode/skills/voicemode/SKILL.md +++ b/plugins/voicemode/skills/voicemode/SKILL.md @@ -7,6 +7,62 @@ description: This skill provides voice interaction capabilities for AI assistant Voice interaction capabilities for Claude Code - enabling natural conversations through speech-to-text (STT) and text-to-speech (TTS) services. +## Naming Clarification + +There are two related names to be aware of: + +| Name | What it is | Example usage | +|------|------------|---------------| +| `voicemode` | CLI command (no hyphen) | `voicemode whisper service status` | +| `voice-mode` | Python package on PyPI (with hyphen) | `uvx voice-mode-install` | + +**Check if CLI is installed:** +```bash +which voicemode # Should show path like ~/.local/bin/voicemode +voicemode --version # Should show version number +``` + +**If not installed:** +```bash +# Option 1: Install permanently +uv tool install voice-mode + +# Option 2: Run without installing (uses uvx) +uvx voice-mode # Equivalent to: voicemode +``` + +## When to Use MCP Tools vs CLI + +| Use Case | Recommended | Why | +|----------|-------------|-----| +| Voice conversations | MCP (`voicemode:converse`) | Faster - MCP server already running | +| Service management | CLI (`voicemode service`) | Works without MCP server | +| Installation | CLI (`voice-mode-install`) | One-time setup | +| Model management | CLI (`voicemode whisper model`) | Administrative task | +| Configuration | CLI (`voicemode config`) | Edit settings directly | + +## Claude Code Plugin + +VoiceMode is available as a Claude Code plugin from the marketplace: + +```bash +# Install from marketplace +/plugin marketplace add mbailey/voicemode +/plugin install voicemode +``` + +The plugin provides: +- **MCP Server** - Full voice capabilities via `voicemode-mcp` +- **Slash Commands** - `/voicemode:converse`, `/voicemode:status`, etc. +- **Hooks** - Sound feedback during tool execution + +After installing the plugin, install voice services: +```bash +/voicemode:install +``` + +For detailed plugin documentation, see `docs/guides/claude-code-plugin.md` in the voicemode repo. + ## Quick Start When a user wants to use voice mode for the first time, guide them through these steps: @@ -35,24 +91,71 @@ If services aren't installed, guide the user to install them: ```bash # Install VoiceMode with UV (recommended) -uv tool install voice-mode-install -voice-mode-install - -# Or update to latest version -voicemode update +uvx voice-mode-install --yes ``` -**Install Voice Services:** +This installs the VoiceMode package and CLI. It does NOT install local speech services. + +#### Local Voice Services (Apple Silicon Recommended) + +**When to offer local services:** +- On Apple Silicon Macs, local services are highly recommended - they provide privacy, speed, and work offline +- Check architecture with: `uname -m` (arm64 = Apple Silicon) +- If Apple Silicon, ask the user: "Would you like to install local voice services? This provides faster, private, offline voice capabilities." + +**Get informed consent before installing:** + +Tell the user what will be downloaded: + +| Service | Download Size | Disk Space | First Start Time | +|---------|---------------|------------|------------------| +| Whisper (tiny) | ~75MB | ~150MB | 30 seconds | +| Whisper (base) | ~150MB | ~300MB | 1-2 minutes | +| Whisper (small) | ~460MB | ~1GB | 2-3 minutes | +| Kokoro TTS | ~350MB | ~700MB | 2-3 minutes | + +**Recommended setup for most users:** Whisper base + Kokoro = ~500MB download, ~1GB disk space. + +After user consents, install services: ```bash -# Install Whisper for local STT +# Install Whisper for local STT (base model recommended) voicemode whisper service install # Install Kokoro for local TTS voicemode kokoro install ``` -Services auto-start after installation. +Services auto-start after installation and are configured to start on login. + +**First Run - Model Downloads:** + +When services start for the first time, they download AI models. The first `converse` call may be slow while models load. Subsequent starts are instant. + +**Check Model Download Progress:** + +```bash +# Whisper model location - check if download complete +ls -lh ~/.voicemode/services/whisper/models/ + +# Kokoro model location +ls -lh ~/.voicemode/services/kokoro/models/ + +# Watch service logs during download +voicemode whisper service logs -f +voicemode kokoro logs -f +``` + +**Choose a Different Whisper Model:** + +```bash +# Smaller/faster (good for testing) +voicemode whisper install --model tiny # ~75MB + +# Larger/more accurate +voicemode whisper install --model small # ~460MB +voicemode whisper install --model medium # ~1.5GB +``` ### 3. Start Your First Conversation @@ -238,19 +341,50 @@ voicemode config edit - User-level: `~/.voicemode` file in home directory - System config: `~/.voicemode/config/config.yaml` -## Advanced Topics +## Provider Options + +VoiceMode supports both cloud and local voice services. You can use either or both. + +### OpenAI API (Cloud) + +If `OPENAI_API_KEY` is set, VoiceMode can use OpenAI's cloud services: +- **STT**: OpenAI Whisper API +- **TTS**: OpenAI voices (alloy, echo, fable, onyx, nova, shimmer) + +This works without installing local services - just set the API key. + +### Local Services -### Provider System +For privacy, speed, and offline use, install local services: -VoiceMode uses OpenAI-compatible endpoints for all services: +| Service | Port | Purpose | +|---------|------|---------| +| Whisper | 2022 | Speech-to-text (STT) | +| Kokoro | 8880 | Text-to-speech (TTS) | -**Cloud Providers:** -- OpenAI API (requires API key) +### Provider Priority -**Local Providers:** -- Whisper.cpp for STT -- Kokoro for TTS -- LiveKit for WebRTC communication +VoiceMode automatically selects providers based on availability: +1. If local services are running, they're used by default (faster, private) +2. If local services aren't available, falls back to OpenAI API (if key is set) +3. You can override with `tts_provider` and `stt_provider` parameters + +### Checking Provider Status + +```bash +# Check what providers are available +voicemode diag registry + +# Check specific service ports +nc -z localhost 2022 && echo "Whisper running" || echo "Whisper not running" +nc -z localhost 8880 && echo "Kokoro running" || echo "Kokoro not running" +``` + +## Advanced Topics + +### Provider System Details + +VoiceMode uses OpenAI-compatible endpoints for all services, enabling seamless switching between providers. The system automatically: - Discovers available providers @@ -370,6 +504,68 @@ voicemode diag info voicemode diag devices ``` +### Conversation History Search + +VoiceMode logs all exchanges and provides powerful search capabilities to find and replay past conversations. + +**Load conversation history into SQLite:** + +```bash +# Load all new exchanges since last sync +voicemode history load + +# Load all exchanges (ignore last sync) +voicemode history load --all + +# Load from specific date +voicemode history load --since 2025-12-01 + +# Load last 7 days +voicemode history load --days 7 +``` + +**Search conversations:** + +```bash +# Full-text search +voicemode history search "minion indirectly" + +# Search only agent speech (TTS) +voicemode history search --type tts "hello" + +# Search only user speech (STT) +voicemode history search --type stt "hello" + +# Search specific date +voicemode history search --date 2025-12-27 "keyword" + +# Search and play first result automatically +voicemode history search --play "memorable quote" + +# Limit results +voicemode history search --limit 50 "conversation" +``` + +**Play audio clips:** + +```bash +# Play by exchange ID (from search results) +voicemode history play ex_abc123def456 +``` + +**Search Features:** +- Full-text search using SQLite FTS5 (fast, supports complex queries) +- Filter by type (stt/tts), date, or conversation +- Audio files automatically resolved from timestamp +- Incremental loading - won't duplicate already-loaded exchanges +- All conversations stored in `~/.voicemode/cache/conversations.db` + +**Use Cases:** +- Find memorable moments or important discussions +- Review what was said in past conversations +- Create clips of agent responses for testing +- Debug conversation issues by reviewing exact exchanges + ### Token Efficiency Tip When using CLI commands directly (not MCP tools), redirect STDERR to save tokens: @@ -414,6 +610,19 @@ For detailed documentation: ## Troubleshooting +**First conversation is slow or times out:** + +This is normal on first run - the services are downloading AI models: +1. Check Whisper logs: `voicemode whisper service logs -f` +2. Check Kokoro logs: `voicemode kokoro logs -f` +3. Wait for downloads to complete (2-5 minutes total) +4. Subsequent starts will be instant + +**Model not loading:** +1. Check disk space: Models need ~500MB for base+kokoro +2. Verify model files exist: `ls -lh ~/.voicemode/services/whisper/models/` +3. Try reinstalling: `voicemode whisper install --model base` + **Services won't start:** 1. Check FFmpeg is installed: `ffmpeg -version` 2. View service logs: `voicemode:service("whisper", "logs")` diff --git a/plugins/voicemode/skills/voicemode/docs/installation.md b/plugins/voicemode/skills/voicemode/docs/installation.md new file mode 100644 index 0000000..815370a --- /dev/null +++ b/plugins/voicemode/skills/voicemode/docs/installation.md @@ -0,0 +1,159 @@ +# VoiceMode Installation Guide + +Complete installation guide for VoiceMode and local voice services. + +## Prerequisites + +Before installing VoiceMode, ensure you have: +- **FFmpeg**: Required for audio processing (`ffmpeg -version` to check) +- **Python 3.11+**: Required for VoiceMode package + +## Installing VoiceMode Package + +Run the installer with `--yes` flag (required for non-interactive environments like Claude Code): + +```bash +uvx voice-mode-install --yes +``` + +This installs the VoiceMode package and CLI. It does NOT install local speech services. + +## Using OpenAI API (Alternative to Local Services) + +If you have an `OPENAI_API_KEY` set, VoiceMode can use OpenAI's cloud services without installing local services: + +- **STT**: OpenAI Whisper API (cloud) +- **TTS**: OpenAI voices (alloy, echo, fable, onyx, nova, shimmer) + +This is useful when: +- You don't want to download large models +- You're not on Apple Silicon +- You prefer cloud services + +You can also use OpenAI as a fallback - if local services fail, VoiceMode will automatically try OpenAI if the API key is set. + +## Local Voice Services + +### Service Ports Reference + +| Service | Port | Purpose | +|---------|------|---------| +| Whisper | 2022 | Speech-to-text (STT) | +| Kokoro | 8880 | Text-to-speech (TTS) | + +### When to Install Local Services + +Local voice services (Whisper for STT, Kokoro for TTS) are recommended when: +- Running on **Apple Silicon Mac** (arm64) - optimal performance +- Privacy is important - audio stays on device +- Working offline or with unreliable internet +- Faster response times are desired + +Check architecture: +```bash +uname -m # arm64 = Apple Silicon +``` + +### Download Sizes and Requirements + +Get informed consent before installing. Here are the resource requirements: + +| Service | Download Size | Disk Space | First Start Time | +|---------|---------------|------------|------------------| +| Whisper (tiny) | ~75MB | ~150MB | 30 seconds | +| Whisper (base) | ~150MB | ~300MB | 1-2 minutes | +| Whisper (small) | ~460MB | ~1GB | 2-3 minutes | +| Whisper (medium) | ~1.5GB | ~3GB | 3-5 minutes | +| Kokoro TTS | ~350MB | ~700MB | 2-3 minutes | + +**Recommended setup**: Whisper base + Kokoro = ~500MB download, ~1GB disk space. + +### Installing Whisper (Speech-to-Text) + +```bash +# Install with base model (recommended) +voicemode whisper service install + +# Or specify a different model +voicemode whisper service install --model tiny # Faster, less accurate +voicemode whisper service install --model small # More accurate +voicemode whisper service install --model medium # Most accurate +``` + +### Installing Kokoro (Text-to-Speech) + +```bash +voicemode kokoro install +``` + +### Service Startup + +Services auto-start after installation and are configured to start on login. + +**First run behavior**: Services download AI models on first start. The first `converse` call may be slow while models load. Subsequent starts are instant. + +## Waiting for Services + +After installation, wait for services to be ready: + +**Wait for Whisper (port 2022):** +```bash +echo "Waiting for Whisper to be ready..." +while ! nc -z localhost 2022 2>/dev/null; do sleep 2; done +echo "Whisper is ready!" +``` + +**Wait for Kokoro (port 8880):** +```bash +echo "Waiting for Kokoro to be ready..." +while ! nc -z localhost 8880 2>/dev/null; do sleep 2; done +echo "Kokoro is ready!" +``` + +## Verifying Installation + +Check service status: +```bash +voicemode whisper service status +voicemode kokoro status +``` + +Check model files: +```bash +ls -lh ~/.voicemode/services/whisper/models/ +ls -lh ~/.voicemode/services/kokoro/models/ +``` + +Test voice conversation: +```bash +voicemode converse -m "Hello, voice mode is working!" +``` + +## Viewing Logs + +Monitor service logs during installation or troubleshooting: + +```bash +# Follow Whisper logs +voicemode whisper service logs -f + +# Follow Kokoro logs +voicemode kokoro logs -f +``` + +## Updating VoiceMode + +```bash +voicemode update +``` + +## Uninstalling Services + +```bash +voicemode whisper service uninstall +voicemode kokoro uninstall +``` + +## Troubleshooting + +See the troubleshooting section in the main VoiceMode skill for common issues. diff --git a/plugins/voicemode/skills/voicemode/docs/whisper.md b/plugins/voicemode/skills/voicemode/docs/whisper.md new file mode 100644 index 0000000..06809f7 --- /dev/null +++ b/plugins/voicemode/skills/voicemode/docs/whisper.md @@ -0,0 +1,137 @@ +# Whisper Reference + +Local speech-to-text (STT) service using Whisper.cpp. + +## Overview + +Whisper provides fast, accurate, private speech-to-text on your local machine. It's optimized for Apple Silicon but works on any platform. + +## Installation + +```bash +voicemode whisper service install +``` + +See @docs/installation.md for detailed installation guide with model options. + +## Service Management + +### Using MCP Tools + +```python +# Check status +voicemode:service("whisper", "status") + +# Start service +voicemode:service("whisper", "start") + +# Stop service +voicemode:service("whisper", "stop") + +# Restart service +voicemode:service("whisper", "restart") + +# View logs +voicemode:service("whisper", "logs", lines=50) + +# Enable auto-start on login +voicemode:service("whisper", "enable") + +# Disable auto-start +voicemode:service("whisper", "disable") +``` + +### Using CLI + +```bash +voicemode whisper service status +voicemode whisper service start +voicemode whisper service stop +voicemode whisper service restart +voicemode whisper service logs +voicemode whisper service logs -f # Follow logs +``` + +## Model Selection + +Whisper supports multiple model sizes, trading accuracy for speed: + +| Model | Size | Speed | Accuracy | Use Case | +|-------|------|-------|----------|----------| +| tiny | 75MB | Fastest | Good | Quick testing, low-power devices | +| base | 150MB | Fast | Better | **Recommended default** | +| small | 460MB | Medium | Very Good | Higher accuracy needs | +| medium | 1.5GB | Slow | Excellent | Professional transcription | + +### Changing Models + +```bash +# Install with different model +voicemode whisper service install --model small + +# Check current model +ls ~/.voicemode/services/whisper/models/ +``` + +## Configuration + +### Port + +Whisper runs on port **2022** by default, exposing an OpenAI-compatible `/v1/audio/transcriptions` endpoint. + +### Model Location + +Models are stored in: +``` +~/.voicemode/services/whisper/models/ +``` + +## Troubleshooting + +### Service Won't Start + +1. Check if FFmpeg is installed: + ```bash + ffmpeg -version + ``` + +2. Check service logs: + ```bash + voicemode whisper service logs + ``` + +3. Verify model files exist: + ```bash + ls -lh ~/.voicemode/services/whisper/models/ + ``` + +### Model Download Stuck + +Check network connectivity and disk space. Try reinstalling: +```bash +voicemode whisper service uninstall +voicemode whisper service install +``` + +### Port Already in Use + +Check if another process is using port 2022: +```bash +lsof -i :2022 +``` + +### Slow Transcription + +- Try a smaller model (tiny or base) +- Ensure the service has completed initial model loading +- Check system resource usage + +## API Compatibility + +Whisper exposes an OpenAI-compatible endpoint: + +``` +POST http://localhost:2022/v1/audio/transcriptions +``` + +This allows VoiceMode to seamlessly switch between local Whisper and cloud OpenAI STT.