Speaky

A cross-platform voice input application that converts speech to text in real-time. Press a hotkey, speak, and your words are automatically typed at the cursor position.

English | 中文

Features

Global Hotkey - Press Alt+Space (or custom shortcut) to start/stop recording from anywhere
Real-time Transcription - See your speech converted to text instantly
Auto-paste - Transcribed text is automatically typed at cursor position
LLM Post-processing - Optional AI polish to add punctuation, fix errors, and improve formatting
Multi-provider Support - Works with DeepSeek, OpenAI, Kimi, Gemini, Zhipu, Ollama, and any OpenAI-compatible API
Recording Indicator - Visual indicator shows when recording is active
History - Browse and copy previous transcriptions
Dark/Light Theme - Beautiful UI with theme support
Cross-platform - Works on Windows, macOS, and Linux

Screenshots

Installation

Pre-built Binaries

Download the latest release for your platform from the Releases page.

Windows: Download .msi or .exe installer
macOS: Download .dmg file
Linux: Download .AppImage or .deb package

Build from Source

Prerequisites:

Node.js 18+
Rust 1.70+
Tauri CLI

# Clone the repository
git clone https://github.com/guangzhaoli/Speaky.git
cd Speaky

# Install dependencies
npm install

# Run in development mode
npm run tauri dev

# Build for production
npm run tauri build

Configuration

ASR (Speech Recognition)

Speaky uses Volcengine Doubao ASR for speech recognition. You'll need to:

Create an account at Volcengine Console
Enable the Speech Recognition service
Create an application and get your credentials:
- App ID
- Access Token
- Secret Key (optional)

Enter these in Settings > General > API Configuration.

LLM Post-processing (Optional)

Enable AI-powered text polish in Settings > LLM Polish:

Toggle "Enable LLM Post-Processing"
Choose a processing mode:
- General - For everyday text input
- Code - Preserves technical terms and syntax
- Meeting - Formal writing style
Add and configure an API provider

Supported providers:

DeepSeek
OpenAI
Kimi (Moonshot)
Google Gemini
Zhipu (GLM)
Ollama (Local)
Any OpenAI-compatible API

Usage

Start Recording: Press the global hotkey (default: Alt+Space) or click the microphone button
Speak: Talk naturally - your speech will be transcribed in real-time
Stop Recording: Release the hotkey or click the button again
Auto-paste: The transcribed text is automatically typed at your cursor position

Keyboard Shortcuts

Shortcut	Action
`Alt+Space`	Start/Stop recording (customizable)

Tech Stack

Framework: Tauri v2 (Rust + React)
Frontend: React 19 + TypeScript + TailwindCSS v4
Audio: cpal (cross-platform audio capture)
Keyboard: enigo (keyboard simulation)
ASR: Volcengine Doubao ASR 2.0 (WebSocket binary protocol)
LLM: OpenAI-compatible API

Project Structure

speaky/
├── src/                    # React frontend
│   ├── App.tsx            # Main application component
│   ├── main.tsx           # React entry point
│   └── style.css          # TailwindCSS styles
├── src-tauri/             # Rust backend
│   ├── src/
│   │   ├── lib.rs         # Tauri application setup
│   │   ├── commands.rs    # IPC commands
│   │   ├── state.rs       # Application state
│   │   ├── audio/         # Audio capture
│   │   ├── asr/           # Speech recognition
│   │   ├── input/         # Keyboard simulation
│   │   └── postprocess/   # LLM post-processing
│   └── Cargo.toml         # Rust dependencies
├── package.json           # Node.js dependencies
└── tauri.conf.json        # Tauri configuration

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Tauri for the amazing framework
Volcengine for the ASR service

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
.vscode		.vscode
public		public
src-tauri		src-tauri
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
Work.md		Work.md
index.html		index.html
indicator.html		indicator.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaky

Features

Screenshots

Installation

Pre-built Binaries

Build from Source

Configuration

ASR (Speech Recognition)

LLM Post-processing (Optional)

Usage

Keyboard Shortcuts

Tech Stack

Project Structure

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

guangzhaoli/Speaky

Folders and files

Latest commit

History

Repository files navigation

Speaky

Features

Screenshots

Installation

Pre-built Binaries

Build from Source

Configuration

ASR (Speech Recognition)

LLM Post-processing (Optional)

Usage

Keyboard Shortcuts

Tech Stack

Project Structure

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages