English | 中文
Talk with your agent by voice — stop watching the screen.
Voice Reply makes your coding agent more than a one-way announcer: it answers the moment you speak, and when it finishes a step it tells you the decision it needs from you. You reply, it continues — a back-and-forth, so your eyes are free but you stay in control.
Works with Claude Code and Codex, with experimental adapters for OpenClaw and Hermes, Chinese and English (pick one at setup, locked; or choose auto-per-message), with an instant opening cue, a decision-first result reply, per-agent voices, one-command setup, cross-platform playback (macOS / Linux / Windows), and offline cues via local Edge TTS.
This skill is designed for:
- People who run long tasks in Claude Code / Codex and don't want to babysit the screen
- People running multiple agents who want to tell by ear which one finished
- Anyone who wants a voice-feedback layer in their agent workflow
Two spoken moments per turn:
- Opening cue — the instant you submit, a hook plays a quick acknowledgement matched to your message's language and type. It fires before the model reads your message, so it only acknowledges — never pretends to answer. Pre-synthesized and cached, so it plays offline in under a second.
- Result reply — when the turn finishes, the model's one-line reply is spoken: a conclusion, or the decision it needs from you (decision-first). You answer and the loop continues — turning a one-way announcement into a back-and-forth. It can carry the real answer (yes/no, a number, "restart to apply"), in a voice matched to the reply's language.
The intelligence lives in the model, not the script: the model ends each reply
with a <<voice: ...>> line, and the hook simply extracts and speaks it. If the
line is missing, result speech stays silent so the hook never reads long body text
or intermediate status by mistake.
| Capability | What It Helps You Do |
|---|---|
| Instant opening cue | Hear immediately that the agent has received the task and started working. |
| Final voice reply | Speak only the final voice marker, so long answers or intermediate status do not get read aloud. |
| Decision-first reminder | When the result needs approval, a choice, or a next step, hear that action first. |
| Chinese + English voice | Use fixed Chinese, fixed English, or automatic language switching per message. |
| Per-agent voice identity | Give Claude Code and Codex different voices so parallel agents are easy to tell apart. |
| Platform | Status |
|---|---|
| Claude Code | ✅ Supported (~/.claude/settings.json hooks) |
| Codex | ✅ Supported (~/.codex/hooks.json) |
| OpenClaw | 🧪 Experimental (adapters/openclaw) |
| Hermes | 🧪 Experimental (adapters/hermes, ~/.hermes/config.yaml shell hooks) |
Playback works on macOS (afplay) and Linux/Windows (ffplay / mpv / mpg123).
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/chemny/voice-reply/main/install.sh)"The installer guides the full setup: repository checkout, Python environment, Edge TTS, voice cache, Claude Code / Codex hooks, result-marker instructions, and a final sound test. Restart the agent session after it finishes.
The default install location is ~/.agents/skills/voice-reply. Set
VOICE_REPLY_INSTALL_DIR before running the installer if you want a different
folder.
After install + restart, just send a message:
- Ask a question → hear "我看看" immediately, then the conclusion (e.g. "对").
- Give an instruction → hear "好,这就做", then "改好了,记得重启" when done.
The installer finishes with an audible test and a self-check report.
Result speech comes from the <<voice: ...>> marker in the agent's final reply.
That marker is written for the ear: short, direct, and focused on the conclusion
or next action.
| Moment | Who decides what to say | What you hear |
|---|---|---|
| You submit | hook classifies the prompt (scripts/opening.mjs, shared) |
我看看 / 好,这就做 / 收到 |
| Agent finishes | the model writes <<voice: …>> |
the real result; silent when missing |
The hook scripts only play audio. Playback is fired in the background so hooks return in ~200 ms and never block the agent. Spoken text is hard-capped at 60 chars.
voice-reply/
├── scripts/
│ ├── speak.mjs # core: text → Edge TTS mp3 → cross-platform player
│ ├── opening.mjs # shared opening-cue rule (both agents)
│ ├── claude-hook.mjs # Claude Code hook entry
│ ├── codex-hook.mjs # Codex hook entry
│ ├── codex-notify.mjs # Codex notify fallback
│ └── manage-hooks.mjs # idempotent install/remove hooks (with backup)
├── adapters/
│ ├── openclaw/ # OpenClaw hook adapter
│ └── hermes/ # Hermes shell-hook adapter
├── install.sh / setup.sh / uninstall.sh / test.sh
├── SKILL.md / README.md / README.zh.md / LICENSE / .gitignore
└── agents/openai.yaml
Runtime data lives in ~/.voice-reply/: config.json (voice/rate/volume),
hooks.json (toggles and fixed texts), cache/ (opening cues).
- Node 18+
- Python 3 (runs edge-tts in a local venv)
- An audio player:
afplayon macOS, orffplay/mpv/mpg123on Linux/Windows - Network access (edge-tts uses Microsoft's endpoint)
Ships with Chinese + English opening phrases and classifiers. During install
you can choose Chinese, English, or auto per-message switching. More languages
can be added by extending the packs in scripts/opening.mjs.
Run the doctor first — it pinpoints which link in the chain is broken:
node scripts/doctor.mjsCommon causes:
- Didn't restart the agent — hooks load at session start, so restart Claude Code / Codex after install.
- No audio player (Linux/Windows) — install
ffplay(ffmpeg),mpv, ormpg123; macOS shipsafplay. - Hooks not registered, or the command path got quoted — rerun the one-command installer; it rewrites the hook in the correct form.
- This Codex build has no hooks support (older / some Windows CLIs) — use the
notifyfallback:node scripts/manage-notify.mjs add "$(pwd)", then restart Codex. It takes over Codex'snotify(preserving and chaining your existing one) and speaks the voice marker on completion only — no opening cue. - edge-tts not installed — rerun the one-command installer (needs python3 + network).
OpenClaw and Hermes adapters reuse the same shared rules as Claude Code and Codex:
- opening cue: classify the user's prompt and speak a short acknowledgement;
- result reply: speak only the explicit
<<voice: ...>>marker; - no marker: stay silent.
OpenClaw files live in adapters/openclaw. Hermes files live in
adapters/hermes; its hook command is configured through ~/.hermes/config.yaml.
Both adapters are marked experimental until their event payloads are validated
across more installs.
The install flow ends by running the doctor and playing a test sound. If you hear it, audio works.