mbailey · dotCipher · Nov 1, 2025
diff --git a/docs/features/recording-visualization.md b/docs/features/recording-visualization.md
@@ -0,0 +1,181 @@
+# Recording Visualization
+
+VoiceMode includes real-time visual feedback during voice recording sessions, making it easy to see when the system is listening, when speech is detected, and how close you are to the silence threshold.
+
+## Features
+
+The recording visualization provides:
+
+- **Audio Level Meter**: Real-time display of microphone input levels (RMS)
+- **Duration Counter**: Shows current recording time vs. maximum duration
+- **Speech Detection Status**: Indicates whether speech has been detected
+- **State Indicator**: Shows current recording state with visual cues:
+  - 🔊 **WAITING** (yellow): Waiting for speech to begin
+  - 🎤 **ACTIVE** (green): Speech detected, actively recording
+  - ⏸️ **SILENCE** (blue): Silence after speech, counting down to stop
+- **Silence Progress Bar**: Shows accumulation toward the silence threshold
+- **Minimum Duration Progress**: Shows progress toward minimum recording duration
+
+## Example Display
+
+```
+╭─────────────────────────────── 🎤 Recording... ───────────────────────────────╮
+│                                                                              │
+│     Duration:  3.2s / 120.0s                                                 │
+│        State:  ACTIVE                                                        │
+│       Speech:  ✓ Detected                                                    │
+│  Audio Level:  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░  72%                │
+│                                                                              │
+╰──────────────────────────────────────────────────────────────────────────────╯
+```
+
+## Configuration
+
+The visualization is **enabled by default** but can be disabled if needed.
+
+### Disable Visualization
+
+To disable the visualization, set the environment variable:
+
+```bash
+export VOICEMODE_RECORDING_VISUALIZATION=false
+```
+
+Or add to your `.voicemode.env` file:
+
+```bash
+VOICEMODE_RECORDING_VISUALIZATION=false
+```
+
+### Enable Visualization (Default)
+
+```bash
+export VOICEMODE_RECORDING_VISUALIZATION=true
+```
+
+## Use Cases
+
+### Why Use Visualization?
+
+1. **Confidence**: See that the microphone is picking up your voice
+2. **Timing**: Know when to start and stop speaking
+3. **Troubleshooting**: Diagnose audio input issues quickly
+4. **Awareness**: Understand when the system will automatically stop recording
+
+### When to Disable
+
+You might want to disable visualization if:
+
+- Running in a script or automated environment
+- Terminal doesn't support rich formatting
+- You prefer minimal output
+- Using VoiceMode programmatically
+
+## Technical Details
+
+### Implementation
+
+The visualization uses the [Rich](https://github.com/Textualize/rich) library for terminal rendering and updates in real-time at 10 FPS during recording.
+
+Key features:
+- Thread-safe updates from audio callback
+- Minimal performance impact
+- Graceful degradation if Rich is not available
+- Proper cleanup on errors or interruption
+
+### Audio Level Calculation
+
+Audio levels are calculated using RMS (Root Mean Square) of the audio samples and normalized to a 0-100% scale:
+
+- **0-30%**: Low/background noise (red)
+- **30-70%**: Normal speech levels (yellow)
+- **70-100%**: Loud speech (green)
+
+### State Machine
+
+The visualization reflects the internal VAD (Voice Activity Detection) state machine:
+
+1. **WAITING**: System is listening but hasn't detected speech yet
+   - No timeout in this state
+   - Waiting for voice activity to begin
+
+2. **ACTIVE**: Speech has been detected
+   - Recording is actively capturing your voice
+   - Silence counter is reset when speech continues
+
+3. **SILENCE**: Speech has stopped
+   - Accumulating silence duration
+   - Will stop recording when silence threshold is reached (default: 1000ms)
+   - Only applies after minimum duration is met
+
+## Related Configuration
+
+These settings affect the recording behavior shown in the visualization:
+
+```bash
+# Maximum recording duration (default: 120s)
+VOICEMODE_DEFAULT_LISTEN_DURATION=120.0
+
+# Silence threshold before stopping (default: 1000ms)
+VOICEMODE_SILENCE_THRESHOLD_MS=1000
+
+# Minimum recording duration (default: 0.5s)
+VOICEMODE_MIN_RECORDING_DURATION=0.5
+
+# VAD aggressiveness 0-3 (default: 2)
+VOICEMODE_VAD_AGGRESSIVENESS=2
+
+# Disable silence detection entirely
+VOICEMODE_DISABLE_SILENCE_DETECTION=false
+```
+
+## Troubleshooting
+
+### Visualization Not Appearing
+
+1. Check that visualization is enabled:
+   ```bash
+   voicemode config get VOICEMODE_RECORDING_VISUALIZATION
+   ```
+
+2. Ensure Rich library is installed:
+   ```bash
+   uv pip list | grep rich
+   ```
+
+3. Verify terminal supports Rich formatting (most modern terminals do)
+
+### Audio Level Always Low
+
+If the audio level meter shows very low levels:
+
+1. Check microphone permissions
+2. Verify correct input device is selected
+3. Test microphone with system settings
+4. Check microphone volume/gain settings
+
+### Audio Level Always High
+
+If the audio level meter shows constant high levels:
+
+1. Check for background noise
+2. Lower microphone gain
+3. Move away from noise sources
+4. Use a noise-cancelling microphone
+
+## Future Enhancements
+
+Potential future additions:
+
+- Waveform visualization
+- Spectral display
+- Frequency analysis
+- Audio history graph
+- Customizable themes
+- Terminal UI (TUI) mode with keyboard controls
+
+## See Also
+
+- [Voice Activity Detection (VAD)](vad.md)
+- [Configuration Guide](../configuration.md)
+- [Troubleshooting Audio Issues](../troubleshooting/audio.md)
diff --git a/pyproject.toml b/pyproject.toml
@@ -48,6 +48,7 @@ dependencies = [
     "livekit-plugins-silero>=0.6.5",
     "click>=8.0.0",
     "pyyaml>=6.0.0",
+    "rich>=13.0.0",  # For terminal UI and recording visualization
 ]
 
 [project.optional-dependencies]

diff --git a/test_visualization.py b/test_visualization.py
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+"""
+Simple test script for recording visualization.
+
+This simulates a recording session with speech detection and silence.
+"""
+
+import time
+import numpy as np
+from voice_mode.recording_visualization import create_visualizer
+
+
+def simulate_recording():
+    """Simulate a recording session with various states."""
+
+    # Configuration
+    max_duration = 10.0
+    silence_threshold_ms = 1000.0
+    min_duration = 2.0
+
+    # Create visualizer
+    visualizer = create_visualizer(
+        max_duration=max_duration,
+        silence_threshold_ms=silence_threshold_ms,
+        min_duration=min_duration,
+        enabled=True
+    )
+
+    print("Starting recording visualization test...")
+    print("This will simulate:")
+    print("1. Waiting for speech (low audio levels)")
+    print("2. Speech detected (high audio levels)")
+    print("3. Silence after speech (accumulating silence)")
+    print()
+
+    visualizer.start()
+
+    try:
+        duration = 0.0
+        dt = 0.1  # Update every 100ms
+        speech_detected = False
+        silence_ms = 0.0
+
+        # Phase 1: Waiting for speech (2 seconds)
+        print("Phase 1: Waiting for speech...")
+        while duration < 2.0:
+            # Low audio level (background noise)
+            audio_level = np.random.uniform(50, 150)
+
+            visualizer.update(
+                duration=duration,
+                audio_level=audio_level,
+                speech_detected=False,
+                silence_ms=0.0,
+                state="WAITING"
+            )
+
+            time.sleep(dt)
+            duration += dt
+
+        # Phase 2: Speech active (3 seconds)
+        print("Phase 2: Speech detected - active recording...")
+        speech_detected = True
+        speech_duration = 0.0
+        while speech_duration < 3.0:
+            # High audio level (speech)
+            audio_level = np.random.uniform(500, 2000)
+
+            visualizer.update(
+                duration=duration,
+                audio_level=audio_level,
+                speech_detected=True,
+                silence_ms=0.0,
+                state="ACTIVE"
+            )
+
+            time.sleep(dt)
+            duration += dt
+            speech_duration += dt
+
+        # Phase 3: Silence after speech (until threshold)
+        print("Phase 3: Silence after speech - accumulating...")
+        while silence_ms < silence_threshold_ms and duration < max_duration:
+            # Low audio level (silence)
+            audio_level = np.random.uniform(20, 80)
+            silence_ms += (dt * 1000)  # Convert to ms
+
+            visualizer.update(
+                duration=duration,
+                audio_level=audio_level,
+                speech_detected=True,
+                silence_ms=silence_ms,
+                state="SILENCE"
+            )
+
+            time.sleep(dt)
+            duration += dt
+
+        print(f"\nRecording complete! Duration: {duration:.1f}s")
+
+    finally:
+        visualizer.stop()
+        print("\nVisualization test complete!")
+
+
+if __name__ == "__main__":
+    try:
+        simulate_recording()
+    except KeyboardInterrupt:
+        print("\n\nTest interrupted by user")
+    except Exception as e:
+        print(f"\n\nError during test: {e}")
+        import traceback
+        traceback.print_exc()
diff --git a/voice_mode/config.py b/voice_mode/config.py
@@ -211,6 +211,9 @@ def load_voicemode_env():
 # Disable silence detection for noisy environments (true/false)
 # VOICEMODE_DISABLE_SILENCE_DETECTION=false
 
+# Enable visual feedback during recording - shows audio levels, duration, silence detection (true/false)
+# VOICEMODE_RECORDING_VISUALIZATION=true
+
 # VAD aggressiveness level 0-3, higher = more strict (default: 2)
 # VOICEMODE_VAD_AGGRESSIVENESS=2
 
@@ -556,6 +559,10 @@ def reload_configuration():
 # Silence detection is enabled by default
 DISABLE_SILENCE_DETECTION = os.getenv("VOICEMODE_DISABLE_SILENCE_DETECTION", "false").lower() in ("true", "1", "yes", "on")
 
+# Enable visual feedback during recording (audio levels, duration, silence detection)
+# Visualization is enabled by default for better user experience
+RECORDING_VISUALIZATION_ENABLED = os.getenv("VOICEMODE_RECORDING_VISUALIZATION", "true").lower() in ("true", "1", "yes", "on")
+
 # VAD (Voice Activity Detection) configuration
 VAD_AGGRESSIVENESS = int(os.getenv("VOICEMODE_VAD_AGGRESSIVENESS", "2"))  # 0-3, higher = more aggressive
 SILENCE_THRESHOLD_MS = int(os.getenv("VOICEMODE_SILENCE_THRESHOLD_MS", "1000"))  # Stop after 1000ms (1 second) of silence