A cross-platform, local-first transcription application with speaker diarization. Think Otter.ai, but open-source and running entirely on your device.
- Local-First: All processing happens on your device - no data leaves your computer
- Speaker Diarization: Automatically identifies and labels different speakers
- Cross-Platform: Works on macOS, Windows, and Linux
- Multiple Languages: Supports 99+ languages via Whisper
- Privacy Focused: No accounts, no telemetry, no cloud processing
npm install# Download tiny.en Whisper model + diarization models (~100MB total)
npm run download:models
# Or choose a larger model for better accuracy:
npm run download:models:base # ~150MB - good balance
npm run download:models:small # ~450MB - better accuracynpm run dev# Build for current platform
npm run dist
# Build for specific platform
npm run dist:mac
npm run dist:win
npm run dist:linux- Framework: Electron + Vite + Vue 3 + TypeScript
- Speech Recognition: sherpa-onnx with Whisper models (WASM-based, cross-platform)
- Speaker Diarization: pyannote segmentation + wespeaker embeddings via sherpa-onnx
- Build: electron-builder for cross-platform distribution
All models use sherpa-onnx's ONNX-converted Whisper format:
| Model | Size | Languages | Quality |
|---|---|---|---|
| tiny.en | ~60MB | English only | Basic |
| base.en | ~120MB | English only | Good |
| small.en | ~400MB | English only | Better |
sherpa-onnx-pyannote-segmentation-3-0.onnx- Speaker segmentationwespeaker_en_voxceleb_resnet34.onnx- Speaker embeddings
Both are downloaded automatically by the download:models script.
- OS: macOS 10.15+, Windows 10+, Ubuntu 20.04+
- RAM: 4GB minimum (8GB+ recommended)
- Storage: 200MB - 1GB depending on model choice
xscribe/
├── src/
│ ├── main/ # Electron main process
│ │ ├── index.ts # App entry point, IPC handlers
│ │ └── services/ # Transcription & diarization services
│ ├── preload/ # Secure IPC bridge
│ └── renderer/ # Vue frontend
│ ├── App.vue
│ └── components/
├── models/ # ML models (gitignored)
├── scripts/ # Helper scripts
└── build/ # Build resources
# Install dependencies
npm install
# Download models
npm run download:models
# Start development server with hot reload
npm run dev
# Type check
npm run typecheck
# Build production
npm run buildXScribe is designed with privacy as a core principle:
- No cloud processing: All transcription and speaker identification runs locally
- No telemetry: We don't collect any usage data
- No accounts: No sign-up or login required
- Your data stays yours: Audio files are never uploaded anywhere
MIT License - See LICENSE for details.
- OpenAI Whisper - Speech recognition model
- sherpa-onnx - ONNX inference runtime
- pyannote-audio - Speaker diarization models