Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions docs/features/recording-visualization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Recording Visualization

VoiceMode includes real-time visual feedback during voice recording sessions, making it easy to see when the system is listening, when speech is detected, and how close you are to the silence threshold.

## Features

The recording visualization provides:

- **Audio Level Meter**: Real-time display of microphone input levels (RMS)
- **Duration Counter**: Shows current recording time vs. maximum duration
- **Speech Detection Status**: Indicates whether speech has been detected
- **State Indicator**: Shows current recording state with visual cues:
- 🔊 **WAITING** (yellow): Waiting for speech to begin
- 🎤 **ACTIVE** (green): Speech detected, actively recording
- ⏸️ **SILENCE** (blue): Silence after speech, counting down to stop
- **Silence Progress Bar**: Shows accumulation toward the silence threshold
- **Minimum Duration Progress**: Shows progress toward minimum recording duration

## Example Display

```
╭─────────────────────────────── 🎤 Recording... ───────────────────────────────╮
│ │
│ Duration: 3.2s / 120.0s │
│ State: ACTIVE │
│ Speech: ✓ Detected │
│ Audio Level: ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░ 72% │
│ │
╰──────────────────────────────────────────────────────────────────────────────╯
```

## Configuration

The visualization is **enabled by default** but can be disabled if needed.

### Disable Visualization

To disable the visualization, set the environment variable:

```bash
export VOICEMODE_RECORDING_VISUALIZATION=false
```

Or add to your `.voicemode.env` file:

```bash
VOICEMODE_RECORDING_VISUALIZATION=false
```

### Enable Visualization (Default)

```bash
export VOICEMODE_RECORDING_VISUALIZATION=true
```

## Use Cases

### Why Use Visualization?

1. **Confidence**: See that the microphone is picking up your voice
2. **Timing**: Know when to start and stop speaking
3. **Troubleshooting**: Diagnose audio input issues quickly
4. **Awareness**: Understand when the system will automatically stop recording

### When to Disable

You might want to disable visualization if:

- Running in a script or automated environment
- Terminal doesn't support rich formatting
- You prefer minimal output
- Using VoiceMode programmatically

## Technical Details

### Implementation

The visualization uses the [Rich](https://github.com/Textualize/rich) library for terminal rendering and updates in real-time at 10 FPS during recording.

Key features:
- Thread-safe updates from audio callback
- Minimal performance impact
- Graceful degradation if Rich is not available
- Proper cleanup on errors or interruption

### Audio Level Calculation

Audio levels are calculated using RMS (Root Mean Square) of the audio samples and normalized to a 0-100% scale:

- **0-30%**: Low/background noise (red)
- **30-70%**: Normal speech levels (yellow)
- **70-100%**: Loud speech (green)

### State Machine

The visualization reflects the internal VAD (Voice Activity Detection) state machine:

1. **WAITING**: System is listening but hasn't detected speech yet
- No timeout in this state
- Waiting for voice activity to begin

2. **ACTIVE**: Speech has been detected
- Recording is actively capturing your voice
- Silence counter is reset when speech continues

3. **SILENCE**: Speech has stopped
- Accumulating silence duration
- Will stop recording when silence threshold is reached (default: 1000ms)
- Only applies after minimum duration is met

## Related Configuration

These settings affect the recording behavior shown in the visualization:

```bash
# Maximum recording duration (default: 120s)
VOICEMODE_DEFAULT_LISTEN_DURATION=120.0

# Silence threshold before stopping (default: 1000ms)
VOICEMODE_SILENCE_THRESHOLD_MS=1000

# Minimum recording duration (default: 0.5s)
VOICEMODE_MIN_RECORDING_DURATION=0.5

# VAD aggressiveness 0-3 (default: 2)
VOICEMODE_VAD_AGGRESSIVENESS=2

# Disable silence detection entirely
VOICEMODE_DISABLE_SILENCE_DETECTION=false
```

## Troubleshooting

### Visualization Not Appearing

1. Check that visualization is enabled:
```bash
voicemode config get VOICEMODE_RECORDING_VISUALIZATION
```

2. Ensure Rich library is installed:
```bash
uv pip list | grep rich
```

3. Verify terminal supports Rich formatting (most modern terminals do)

### Audio Level Always Low

If the audio level meter shows very low levels:

1. Check microphone permissions
2. Verify correct input device is selected
3. Test microphone with system settings
4. Check microphone volume/gain settings

### Audio Level Always High

If the audio level meter shows constant high levels:

1. Check for background noise
2. Lower microphone gain
3. Move away from noise sources
4. Use a noise-cancelling microphone

## Future Enhancements

Potential future additions:

- Waveform visualization
- Spectral display
- Frequency analysis
- Audio history graph
- Customizable themes
- Terminal UI (TUI) mode with keyboard controls

## See Also

- [Voice Activity Detection (VAD)](vad.md)
- [Configuration Guide](../configuration.md)
- [Troubleshooting Audio Issues](../troubleshooting/audio.md)
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ dependencies = [
"livekit-plugins-silero>=0.6.5",
"click>=8.0.0",
"pyyaml>=6.0.0",
"rich>=13.0.0", # For terminal UI and recording visualization
]

[project.optional-dependencies]
Expand Down
114 changes: 114 additions & 0 deletions test_visualization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
#!/usr/bin/env python3
"""
Simple test script for recording visualization.

This simulates a recording session with speech detection and silence.
"""

import time
import numpy as np
from voice_mode.recording_visualization import create_visualizer


def simulate_recording():
"""Simulate a recording session with various states."""

# Configuration
max_duration = 10.0
silence_threshold_ms = 1000.0
min_duration = 2.0

# Create visualizer
visualizer = create_visualizer(
max_duration=max_duration,
silence_threshold_ms=silence_threshold_ms,
min_duration=min_duration,
enabled=True
)

print("Starting recording visualization test...")
print("This will simulate:")
print("1. Waiting for speech (low audio levels)")
print("2. Speech detected (high audio levels)")
print("3. Silence after speech (accumulating silence)")
print()

visualizer.start()

try:
duration = 0.0
dt = 0.1 # Update every 100ms
speech_detected = False
silence_ms = 0.0

# Phase 1: Waiting for speech (2 seconds)
print("Phase 1: Waiting for speech...")
while duration < 2.0:
# Low audio level (background noise)
audio_level = np.random.uniform(50, 150)

visualizer.update(
duration=duration,
audio_level=audio_level,
speech_detected=False,
silence_ms=0.0,
state="WAITING"
)

time.sleep(dt)
duration += dt

# Phase 2: Speech active (3 seconds)
print("Phase 2: Speech detected - active recording...")
speech_detected = True
speech_duration = 0.0
while speech_duration < 3.0:
# High audio level (speech)
audio_level = np.random.uniform(500, 2000)

visualizer.update(
duration=duration,
audio_level=audio_level,
speech_detected=True,
silence_ms=0.0,
state="ACTIVE"
)

time.sleep(dt)
duration += dt
speech_duration += dt

# Phase 3: Silence after speech (until threshold)
print("Phase 3: Silence after speech - accumulating...")
while silence_ms < silence_threshold_ms and duration < max_duration:
# Low audio level (silence)
audio_level = np.random.uniform(20, 80)
silence_ms += (dt * 1000) # Convert to ms

visualizer.update(
duration=duration,
audio_level=audio_level,
speech_detected=True,
silence_ms=silence_ms,
state="SILENCE"
)

time.sleep(dt)
duration += dt

print(f"\nRecording complete! Duration: {duration:.1f}s")

finally:
visualizer.stop()
print("\nVisualization test complete!")


if __name__ == "__main__":
try:
simulate_recording()
except KeyboardInterrupt:
print("\n\nTest interrupted by user")
except Exception as e:
print(f"\n\nError during test: {e}")
import traceback
traceback.print_exc()
7 changes: 7 additions & 0 deletions voice_mode/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ def load_voicemode_env():
# Disable silence detection for noisy environments (true/false)
# VOICEMODE_DISABLE_SILENCE_DETECTION=false

# Enable visual feedback during recording - shows audio levels, duration, silence detection (true/false)
# VOICEMODE_RECORDING_VISUALIZATION=true

# VAD aggressiveness level 0-3, higher = more strict (default: 2)
# VOICEMODE_VAD_AGGRESSIVENESS=2

Expand Down Expand Up @@ -556,6 +559,10 @@ def reload_configuration():
# Silence detection is enabled by default
DISABLE_SILENCE_DETECTION = os.getenv("VOICEMODE_DISABLE_SILENCE_DETECTION", "false").lower() in ("true", "1", "yes", "on")

# Enable visual feedback during recording (audio levels, duration, silence detection)
# Visualization is enabled by default for better user experience
RECORDING_VISUALIZATION_ENABLED = os.getenv("VOICEMODE_RECORDING_VISUALIZATION", "true").lower() in ("true", "1", "yes", "on")

# VAD (Voice Activity Detection) configuration
VAD_AGGRESSIVENESS = int(os.getenv("VOICEMODE_VAD_AGGRESSIVENESS", "2")) # 0-3, higher = more aggressive
SILENCE_THRESHOLD_MS = int(os.getenv("VOICEMODE_SILENCE_THRESHOLD_MS", "1000")) # Stop after 1000ms (1 second) of silence
Expand Down
Loading