A cross-platform voice input application that converts speech to text in real-time. Press a hotkey, speak, and your words are automatically typed at the cursor position.
- Global Hotkey - Press
Alt+Space(or custom shortcut) to start/stop recording from anywhere - Real-time Transcription - See your speech converted to text instantly
- Auto-paste - Transcribed text is automatically typed at cursor position
- LLM Post-processing - Optional AI polish to add punctuation, fix errors, and improve formatting
- Multi-provider Support - Works with DeepSeek, OpenAI, Kimi, Gemini, Zhipu, Ollama, and any OpenAI-compatible API
- Recording Indicator - Visual indicator shows when recording is active
- History - Browse and copy previous transcriptions
- Dark/Light Theme - Beautiful UI with theme support
- Cross-platform - Works on Windows, macOS, and Linux
Download the latest release for your platform from the Releases page.
- Windows: Download
.msior.exeinstaller - macOS: Download
.dmgfile - Linux: Download
.AppImageor.debpackage
Prerequisites:
# Clone the repository
git clone https://github.com/guangzhaoli/Speaky.git
cd Speaky
# Install dependencies
npm install
# Run in development mode
npm run tauri dev
# Build for production
npm run tauri buildSpeaky uses Volcengine Doubao ASR for speech recognition. You'll need to:
- Create an account at Volcengine Console
- Enable the Speech Recognition service
- Create an application and get your credentials:
- App ID
- Access Token
- Secret Key (optional)
Enter these in Settings > General > API Configuration.
Enable AI-powered text polish in Settings > LLM Polish:
- Toggle "Enable LLM Post-Processing"
- Choose a processing mode:
- General - For everyday text input
- Code - Preserves technical terms and syntax
- Meeting - Formal writing style
- Add and configure an API provider
Supported providers:
- DeepSeek
- OpenAI
- Kimi (Moonshot)
- Google Gemini
- Zhipu (GLM)
- Ollama (Local)
- Any OpenAI-compatible API
- Start Recording: Press the global hotkey (default:
Alt+Space) or click the microphone button - Speak: Talk naturally - your speech will be transcribed in real-time
- Stop Recording: Release the hotkey or click the button again
- Auto-paste: The transcribed text is automatically typed at your cursor position
| Shortcut | Action |
|---|---|
Alt+Space |
Start/Stop recording (customizable) |
- Framework: Tauri v2 (Rust + React)
- Frontend: React 19 + TypeScript + TailwindCSS v4
- Audio: cpal (cross-platform audio capture)
- Keyboard: enigo (keyboard simulation)
- ASR: Volcengine Doubao ASR 2.0 (WebSocket binary protocol)
- LLM: OpenAI-compatible API
speaky/
├── src/ # React frontend
│ ├── App.tsx # Main application component
│ ├── main.tsx # React entry point
│ └── style.css # TailwindCSS styles
├── src-tauri/ # Rust backend
│ ├── src/
│ │ ├── lib.rs # Tauri application setup
│ │ ├── commands.rs # IPC commands
│ │ ├── state.rs # Application state
│ │ ├── audio/ # Audio capture
│ │ ├── asr/ # Speech recognition
│ │ ├── input/ # Keyboard simulation
│ │ └── postprocess/ # LLM post-processing
│ └── Cargo.toml # Rust dependencies
├── package.json # Node.js dependencies
└── tauri.conf.json # Tauri configuration
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Tauri for the amazing framework
- Volcengine for the ASR service