Skip to content

Relax realtime voice turn detection for slower speech#587

Merged
rogerchappel merged 1 commit into
mainfrom
codex/relax-realtime-barge-in-for-slow-speech
May 23, 2026
Merged

Relax realtime voice turn detection for slower speech#587
rogerchappel merged 1 commit into
mainfrom
codex/relax-realtime-barge-in-for-slow-speech

Conversation

@rogerchappel
Copy link
Copy Markdown
Owner

Summary

  • Pass OpenClaw realtime VAD tuning through the CrewCMD session endpoint
  • Default realtime sessions to wait 2000ms of silence before ending a user turn
  • Increase prefix padding to 500ms so slower starts and resumes are less likely to be clipped
  • Add route coverage for the default tuning and explicit VAD overrides

Why

OpenClaw defaults realtime voice endpointing to roughly 500ms of silence. That is too aggressive for slower speakers and causes the agent to respond mid-thought.

Verification

  • pnpm test -- src/app/api/runtimes/[id]/talk/realtime/session/route.test.ts src/lib/gateway-client-realtime.test.ts
  • pnpm typecheck
  • git diff --check
  • push hook: pnpm typecheck
  • push hook: pnpm build (passes with existing Turbopack NFT-list warning)

Risk

Low. Scoped to realtime session creation parameters and type coverage. The tradeoff is slightly slower agent turn-taking after the user finishes speaking.

@rogerchappel rogerchappel merged commit 9128c26 into main May 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant