Skip to content

Conversation

@ai-cora
Copy link
Collaborator

@ai-cora ai-cora commented Dec 19, 2025

Summary

  • Implement privacy-respecting, opt-in telemetry system for VoiceMode
  • Add automatic telemetry send on server startup with 24-hour cooldown
  • Include Cloudflare Worker backend with D1 database and rate limiting

What's Included

Core Telemetry Module (voice_mode/telemetry/)

  • collector.py: Gather stats from conversation logs with session tracking
  • privacy.py: Anonymization with duration/size binning, path sanitization
  • client.py: HTTP transmission with retry logic and offline queueing
  • sender.py: Background telemetry send with cooldown tracking

Configuration

  • VOICEMODE_TELEMETRY: ask/true/false with normalization
  • DO_NOT_TRACK: Universal opt-out standard (overrides all)
  • Anonymous telemetry ID stored in ~/.voicemode/telemetry_id
  • Environment detection: OS, install method, MCP host, exec source

Opt-in UX

  • CLI prompt on first interactive use
  • MCP resources: voicemode://telemetry/status and /opt-in-prompt
  • MCP tools: telemetry_set_preference(), telemetry_check_status()

Cloudflare Worker Backend (cloudflare-worker/)

  • D1 database for event storage with 90-day retention
  • KV namespace for rate limiting (10/hour per ID, 100/hour per IP)
  • Payload validation, idempotency, IP hashing for privacy
  • Deployment docs (QUICK_START.md, DEPLOYMENT.md)

Privacy Protections

  • Duration bins (<1min, 1-5min, 5-10min, 10-20min, 20-60min, >60min)
  • Exchange count bins (0, 1-5, 6-10, 11-20, >20)
  • Provider normalization (kokoro, whisper-local, openai)
  • No PII, audio content, or conversation text collected

Test plan

  • Run existing tests to ensure no regressions
  • Test telemetry opt-in prompt in CLI
  • Verify telemetry is disabled by default
  • Test DO_NOT_TRACK environment variable override
  • Verify no telemetry sent when opt-out

🤖 Generated with Claude Code

ai-cora and others added 5 commits December 14, 2025 16:20
Implement privacy-respecting telemetry to understand VoiceMode usage patterns.

Core telemetry module (voice_mode/telemetry/):
- collector.py: Gather stats from conversation logs with proper
  multi-exchange session tracking (uses conversation_id grouping)
- privacy.py: Anonymization with duration/size binning, path sanitization
- client.py: HTTP transmission with retry logic and offline queueing

Configuration (voice_mode/config.py):
- VOICEMODE_TELEMETRY: ask/true/false with normalization
- DO_NOT_TRACK: Universal opt-out standard (overrides all)
- Anonymous telemetry ID stored in ~/.voicemode/telemetry_id
- Environment detection: OS, install method, MCP host, exec source

Opt-in UX:
- CLI prompt on first interactive use (voice_mode/cli.py)
- MCP resources: voicemode://telemetry/status and /opt-in-prompt
- MCP tools: telemetry_set_preference(), telemetry_check_status()

Cloudflare Worker backend (cloudflare-worker/):
- D1 database for event storage with 90-day retention
- KV namespace for rate limiting (10/hour per ID, 100/hour per IP)
- Payload validation, idempotency, IP hashing for privacy
- Deployment docs (QUICK_START.md, DEPLOYMENT.md)

Privacy protections:
- Duration bins (<1min, 1-5min, 5-10min, 10-20min, 20-60min, >60min)
- Exchange count bins (0, 1-5, 6-10, 11-20, >20)
- Provider normalization (kokoro, whisper-local, openai)
- No PII, audio content, or conversation text collected

Documentation (docs/reference/environment.md):
- Telemetry configuration variables
- Three opt-out methods with precedence rules
- What is/isn't collected
- Privacy protections explained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add sender.py module with maybe_send_telemetry_background() function
- Hook telemetry send into server.py startup (non-blocking background thread)
- Implement 24-hour cooldown between sends via last_send timestamp
- Log sent payloads locally to ~/.voicemode/logs/telemetry/ for transparency
- Auto-cleanup of telemetry logs older than 30 days
- Fix collector.py to output 'os' instead of 'os_type' to match worker schema
- Add sync-telemetry-local.sh script to export D1 database to local SQLite

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add period_start and period_end fields to telemetry payload
- Collector now accepts start_date and end_date parameters
- Sender uses last period_end as start for next collection
- Fallback to timestamp field for older log entries without period_end
- First send collects last 24 hours, subsequent sends from last period_end

This ensures continuous telemetry coverage without gaps or overlaps,
regardless of how often users start VoiceMode.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Users now only need to set VOICEMODE_TELEMETRY=true to enable telemetry.
The endpoint URL defaults to the Cloudflare worker.

Note: URL will be updated to a custom domain before release.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Remove separate last_send file and TELEMETRY_DIR constant
- Use period_end from telemetry logs for both cooldown checking and
  continuous coverage
- Simplifies architecture by using log files for dual purpose:
  transparency audit trail AND cooldown tracking

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants