Skip to content

feat: add voice mode for chat input#1159

Open
jcfs wants to merge 4 commits into
mainfrom
feature/implement-voice-mode-cv0
Open

feat: add voice mode for chat input#1159
jcfs wants to merge 4 commits into
mainfrom
feature/implement-voice-mode-cv0

Conversation

@jcfs
Copy link
Copy Markdown
Contributor

@jcfs jcfs commented May 29, 2026

Adds voice input to the chat composer with three transcription engines (in-browser Web Speech, in-browser Whisper via transformers.js, server-side OpenAI Whisper fallback) and a /settings/voice-mode page so users can pick engine, language, activation mode, auto-send, model size, and shortcut — or turn the whole feature off.

Important Changes

  • New internal/voice backend package with a POST /api/v1/transcribe endpoint behind KANDEV_VOICE_OPENAI_API_KEY (returns 503 when unset; frontend gracefully degrades).
  • User settings gain a nested voice_mode JSON object — no schema migration; old rows default enabled to true via a *bool storage adapter.
  • Whisper Web runs in a Web Worker using @huggingface/transformers with fp16 decoder weights pulled from onnx-community/whisper-* (sidesteps the MatMulNBits scale-missing crash that q4/q8 hit on WASM).
  • Voice setting changes broadcast over the existing user.settings.updated WS event so toggling on one tab updates every open chat tab live.

Validation

  • make -C apps/backend test lint — all packages pass, 0 lint issues.
  • cd apps && pnpm --filter @kandev/web typecheck lint test — green, 2517 tests pass.
  • New tests: voice/transcribe, voice/handlers, useVoiceInput, parseVoiceMode, detectVoiceCapabilities, resolveActiveEngine, transcribeAudio, applyVoiceMode.
  • Manual smoke: localhost desktop (Web Speech path), mobile Safari over Cloudflare tunnel (Whisper Web path with fp16 model download).

Possible Improvements

Low risk for non-voice flows (additive). The Whisper Web first-load is ~75 MB on Base — would be smaller if we ever wire WebGPU + q4 detection. Server-side Whisper currently only reads a global KANDEV_VOICE_OPENAI_API_KEY; a per-user/agent credential lookup is a natural follow-up but explicitly out of scope here.

Checklist

  • I have performed a self-review of my code.
  • I have manually tested my changes and they work as expected.
  • My changes have tests that cover the new functionality and edge cases.
  • If my change touches UI files (apps/web/), I have added or updated Playwright e2e tests in apps/web/e2e/ and verified them with make test-e2e.

Preview Environment

URL https://kandev-pr-1159-bwo7.sprites.app
Commit ffe88a6
Agent Mock agent

Updates automatically on each push. Destroyed when the PR is closed.

Adds a voice input button to the chat composer with three transcription engines:
in-browser Web Speech API, in-browser Whisper via @huggingface/transformers, and
a server-side OpenAI Whisper fallback gated on KANDEV_VOICE_OPENAI_API_KEY.

User-configurable from a new /settings/voice-mode page:
- Master on/off toggle (defaults to on; opt-out, not opt-in)
- Engine preference (auto / Web Speech / Whisper Web / Whisper Server)
- Language hint (auto-detect or BCP-47 pinned)
- Activation mode (click-to-toggle or hold-to-talk)
- Auto-send after transcription
- Whisper Web model size (tiny / base / small)
- Editable Cmd+Shift+M keyboard shortcut

Voice settings live under user_settings.voice_mode as a nested JSON object —
no schema migration. Setting changes broadcast via the existing
user.settings.updated WS event so any open chat tab updates live.

The mic button is rendered on the right of the composer alongside the send
button (round primary fill; destructive fill with pulse ring while recording).
On browsers without a usable engine (e.g. iOS Chrome over HTTP) the button
stays visible as a muted greyed icon — tapping it surfaces a toast with the
reason (typically a missing secure context).

Whisper Web runs in a Web Worker with fp16 decoder weights (avoids the
MatMulNBits scale-missing crash that q4/q8 hit on WASM). Models are pulled
from the onnx-community/* mirror and cached by transformers.js in the Cache
Storage on first use.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds full-stack voice input: server-side transcription fallback (OpenAI Whisper), client engines (Web Speech, Whisper Web, Whisper Server), user settings UI and persistence, chat composer integration with transcript insertion and optional auto-send, keyboard shortcut, worker for in-browser Whisper, and tests across backend and frontend.

Changes

Voice Input & Transcription Feature

Layer / File(s) Summary
Backend Configuration & Voice Settings Models
apps/backend/internal/common/config/config.go, apps/backend/internal/user/models/models.go, apps/backend/internal/user/dto/dto.go
Config gains Voice section with OpenAI API key; UserSettings adds VoiceMode with engine, language, activation mode, auto-send, and Whisper model preferences; DTO updated to serialize/deserialize voice settings.
Backend Transcription Service & Handler
apps/backend/internal/voice/transcribe/service.go, apps/backend/internal/voice/handlers/transcribe_handlers.go, apps/backend/internal/voice/transcribe/service_test.go, apps/backend/internal/voice/handlers/transcribe_handlers_test.go
New transcribe.Service calls OpenAI Whisper endpoint with multipart audio, handles ErrNotConfigured, returns UpstreamError for non-2xx responses. HTTP handler validates configuration, enforces size limits, parses multipart audio, and returns 503/502/500 based on error type. Full test coverage for success and error paths.
Backend User Settings Orchestration & Persistence
apps/backend/internal/user/service/service.go, apps/backend/internal/user/controller/controller.go, apps/backend/internal/user/store/sqlite.go, apps/backend/cmd/kandev/...
Service validates voice mode update fields (engine, mode, model), normalizes language, preserves existing values for partial updates, publishes event with voice_mode. Controller forwards voice mode from request. SQLite layer persists/hydrates voice_mode JSON with defaults for legacy rows. Configuration wiring passes OpenAI key through to handler.
Frontend Type System & Voice API Client
apps/web/lib/types/http-voice.ts, apps/web/lib/types/http.ts, apps/web/lib/types/backend.ts, apps/web/lib/api/domains/settings-api.ts, apps/web/lib/api/domains/voice-api.ts
TypeScript types for voice engines (webSpeech/whisperWeb/whisperServer/auto), activation modes, and Whisper model sizes. transcribeAudio API client posts multipart FormData to backend, extracts error messages, throws ApiError on failure. User settings update payload now includes optional voice_mode field.
Frontend Voice Capture & Processing Engine
apps/web/hooks/use-voice-input.ts, apps/web/hooks/use-voice-input.test.ts, apps/web/lib/voice/capabilities.ts, apps/web/lib/voice/capabilities.test.ts, apps/web/lib/voice/whisper-web-client.ts, apps/web/lib/voice/whisper-web-models.ts, apps/web/workers/whisper-web.worker.ts
Core useVoiceInput hook supports Web Speech API and audio capture with server/in-browser transcription; manages state transitions (idle → requesting → recording → processing), error mapping, and model-load progress. Capability detection identifies available engines with SSR safety. WhisperWebClient wraps worker communication with single-request concurrency and progress tracking. Web Worker loads Hugging Face Transformers ASR pipeline, handles init/transcribe/dispose messages.
Frontend Voice Settings Page & Configuration UI
apps/web/app/settings/voice-mode/page.tsx, apps/web/components/settings/voice-mode-settings.tsx, apps/web/components/settings/settings-app-sidebar.tsx, apps/web/components/settings/keyboard-shortcuts-card.tsx
New voice mode settings page fetches initial state from server, renders enable toggle, engine card with capability-aware options, language/mode/auto-send selections, Whisper model picker, availability banner for unsupported contexts, and keyboard shortcut recorder. Sidebar refactored with reusable SimpleSidebarRows helper; adds "Voice Mode" entry under microphone icon. ShortcutRecorder exported for reuse.
Frontend Chat Input Voice Integration
apps/web/components/task/chat/voice-input-button.tsx, apps/web/components/task/chat/chat-input-container.tsx, apps/web/components/task/chat/chat-input-body.tsx, apps/web/components/task/chat/chat-input-toolbar.tsx
New VoiceInputButton returns null when disabled, renders unsupported fallback for non-HTTPS contexts, otherwise shows enabled button wired to useVoiceInput hook with hold/toggle mode support, error toasts, recording indicator, and keyboard shortcut toggling. Chat input container inserts transcribed text at cursor with smart spacing, optionally auto-sends. Editor area and toolbar props extended to accept voice callbacks.
Frontend State Management & SSR Integration
apps/web/lib/state/slices/settings/types.ts, apps/web/lib/state/slices/settings/settings-slice.ts, apps/web/lib/ssr/user-settings.ts, apps/web/lib/ssr/user-settings.test.ts, apps/web/lib/ws/handlers/users.ts, apps/web/lib/keyboard/constants.ts, apps/web/lib/keyboard/shortcut-overrides.ts, apps/web/hooks/use-user-display-settings.ts, apps/web/package.json
Settings state slice initializes voiceMode from DEFAULT_VOICE_MODE_STATE. SSR parser converts backend snake_case to camelCase with defaults. WebSocket handler updates voice mode on user.settings.updated event. New VOICE_INPUT_TOGGLE keyboard shortcut (Ctrl/Cmd+Shift+M) added to configurable list. Carry-forward logic defaults voice mode when missing. Added @huggingface/transformers dependency.

Sequence Diagrams

sequenceDiagram
  participant User
  participant Chat as Chat Composer
  participant Button as Voice Button
  participant Hook as useVoiceInput
  participant WebSpeech as Web Speech API
  participant Server as Backend /api/v1/transcribe
  participant OpenAI as OpenAI Whisper

  User->>Chat: press voice shortcut or click voice button
  Chat->>Button: trigger onVoiceTranscript/onAutoSend
  Button->>Hook: start()
  Hook->>WebSpeech: SpeechRecognition.start() or beginCapture()
  WebSpeech-->>Hook: onresult with transcript
  Hook->>Chat: onTranscript("hello world")
  Chat->>Chat: insert at cursor
  Hook->>Server: optional transcribe via server (capture path)
  Server->>OpenAI: multipart upload
  OpenAI-->>Server: {text: ...}
  Server-->>Hook: {text: ...}
  Hook-->>Chat: onTranscript("...from server")
Loading
sequenceDiagram
  participant Browser
  participant Worker as Whisper Web Worker
  participant HF as Hugging Face CDN

  Browser->>Worker: {type: 'init', model: 'base'}
  Worker->>HF: fetch Transformers model files
  HF-->>Worker: model bytes, emit progress
  Worker-->>Browser: {type: 'ready'}
  Browser->>Worker: {type: 'transcribe', audio: Float32Array}
  Worker->>Worker: pipeline.transcribe(audio)
  Worker-->>Browser: {type: 'result', text: '...'}
Loading

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested Labels

codex

Suggested Reviewers

  • zeval

🐰 A whisper, a click, a captured thought,
From voices bright to code well-wrought,
The microphone now hears your call,
While toggles dance and settings sprawl. 🎙️✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/implement-voice-mode-cv0

@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @jcfs's task in 7m 46s —— View job


Code Review — feat: add voice mode for chat input

  • Read skill instructions and gather context
  • Analyze backend changes (voice package, user settings, config)
  • Analyze frontend changes (voice components, hooks, workers)
  • Identify security, correctness, and architecture issues
  • Post findings

Findings

Blocker (must fix before merge)

  1. Capture engine paths untestedapps/web/hooks/use-voice-input.test.ts:199

    • Issue: The test file only exercises the Web Speech path. beginCapture, finishCapture, stopCapture, transcribeViaServer, and transcribeViaWhisperWeb have zero test coverage. The transcribeAudio mock is already wired but never triggered.
    • Why: The whisperServer and whisperWeb paths contain the most branchy logic in the hook — empty blob handling, 503 error mapping, driverRef nulling timing — none of which is exercised.
    • Fix: Add tests for whisperServer engine (mock MediaRecorder + getUserMedia, verify transcribeAudio is called and transcript is emitted); empty blob (no onTranscript call); and 503 error path (emits code: "not-configured"). Fix this →
  2. WhisperWebClient has no test fileapps/web/lib/voice/whisper-web-client.ts

    • Issue: The Worker messaging protocol, pending-request queue, and blobToWhisperFloat32 audio-resample function are all untested. There is no whisper-web-client.test.ts.
    • Why: This is an API-client-class with non-trivial logic: it manages a single-inflight-request queue, maps Worker postMessage/onmessage to Promises, and handles dispose-during-pending. A broken pending reset on dispose or a wrong ready check in init() would silently degrade the Whisper Web path.
    • Fix: Create apps/web/lib/voice/whisper-web-client.test.ts. Mock the Worker constructor to intercept postMessage and fire onmessage events manually. Cover: init resolves on { type: "ready" }, rejects on { type: "error" }, is a no-op when model already loaded; transcribe rejects before init; concurrent call rejected; dispose rejects pending. Fix this →

Suggestion (recommended, doesn't block)

  1. Misleading 400 when body exceeds 10 MBapps/backend/internal/voice/handlers/transcribe_handlers.go:57-61
    • Issue: When MaxBytesReader fires during multipart parsing, c.FormFile("audio") returns an error that is mapped to 400 "audio file is required". The correct status for an oversized payload is 413 StatusRequestEntityTooLarge.
    • Fix: Check errors.As(err, new(*http.MaxBytesError)) before the generic 400 and return 413 instead. See inline comment for the exact snippet. Fix this →

Summary

Severity Count
Blocker 2
Suggestion 1

Verdict: Blocked — fix blockers first.

The backend work (transcription service, config, user settings persistence, validation) is well-structured: the *bool storage adapter for enabled backward-compat, the mergeVoiceModeDefaults approach, the MaxBytesReader cap, and the error type hierarchy are all solid. The two blockers are purely on test coverage for the frontend capture paths — use /tdd to fill those gaps before merge.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 29, 2026

Greptile Summary

Adds a three-engine voice input feature to the chat composer (Web Speech API, in-browser Whisper via @huggingface/transformers, and a server-side OpenAI Whisper fallback) along with a /settings/voice-mode page for user configuration. The implementation is additive with careful degradation: the server endpoint returns 503 when unconfigured, old user rows default enabled to true via a *bool storage adapter, and voice settings broadcast live over the existing user.settings.updated WebSocket event.

  • Backend: New internal/voice package with POST /api/v1/transcribe (unauthenticated, 10 MB cap, key-guarded 503), plus VoiceModeSettings wired into the user settings store/service with full validation.
  • Frontend: useVoiceInput hook abstracts all three engines behind a unified start/stop/cancel API; WhisperWebClient proxies the transformers.js Web Worker; VoiceInputButton handles toggle/hold activation, keyboard shortcut, auto-send, and browser-capability fallback.

Confidence Score: 5/5

Safe to merge. The change is purely additive — all new paths are gated behind the voice_mode.enabled user setting, which defaults to true but can be toggled off per user. Non-voice flows are unaffected.

The backend is well-guarded (503 when unconfigured, 10 MB cap, enum validation on all voice settings fields), the frontend gracefully degrades across all three engine paths, and the settings broadcast correctly propagates changes across tabs. Only cosmetic and documentation-level concerns were found.

No files require special attention. The pnpm lock file is worth a second look for the dev-snapshot ONNX Runtime packages, but this is a transitive dependency from @huggingface/transformers and doesn't affect correctness.

Important Files Changed

Filename Overview
apps/backend/internal/voice/handlers/transcribe_handlers.go New HTTP handler for POST /api/v1/transcribe; size limiting via MaxBytesReader with correct 413 handling, 503 short-circuit when unconfigured.
apps/backend/internal/voice/transcribe/service.go OpenAI Whisper proxy; well-typed error handling (ErrNotConfigured, UpstreamError), MIME→extension mapping, and functional-options pattern for testability.
apps/backend/internal/user/store/sqlite.go Adds voice_mode persistence using storedVoiceMode (*bool for enabled) to correctly handle pre-existing rows that predate the column.
apps/web/hooks/use-voice-input.ts Core voice hook; driver abstraction (webSpeech / capture) is clean. whisperErrorMessage always returns code "model-load" for any Whisper error, including runtime transcription failures.
apps/web/lib/voice/whisper-web-client.ts Robust Worker client with single-inflight guard, stale-worker error protection, and clean audio resample path.
apps/web/workers/whisper-web.worker.ts transformers.js worker using fp16 decoder dtype to avoid MatMulNBits crash on WASM; model reuse guard prevents redundant downloads.
apps/web/lib/voice/whisper-web-models.ts Model ID registry for onnx-community mirrors; contains a stale comment saying "q8 variant" while the worker actually uses fp16.
apps/web/components/task/chat/voice-input-button.tsx Toggle/hold activation, keyboard shortcut binding, auto-send via rAF, and accessibility labels are all well-implemented.
apps/web/components/settings/voice-mode-settings.tsx Settings page with optimistic updates, key-scoped rollback on failure, and live capability detection; correctly uses storeApi to avoid stale closures.
apps/pnpm-lock.yaml Adds @huggingface/transformers@4.2.0 which pulls dev-build snapshots of onnxruntime-web and onnxruntime-common; also surfaces a CWE-502 deprecation on @ungap/structured-clone@1.3.0.

Sequence Diagram

sequenceDiagram
    participant User
    participant VoiceInputButton
    participant useVoiceInput
    participant WebSpeechAPI
    participant WhisperWebClient
    participant Worker as whisper-web.worker
    participant BackendAPI as /api/v1/transcribe
    participant OpenAI as OpenAI Whisper

    User->>VoiceInputButton: click / hold / shortcut
    VoiceInputButton->>useVoiceInput: start()
    useVoiceInput->>useVoiceInput: resolveActiveEngine()

    alt "engine = webSpeech"
        useVoiceInput->>WebSpeechAPI: recognition.start()
        WebSpeechAPI-->>useVoiceInput: onresult (transcript)
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    else "engine = whisperWeb"
        useVoiceInput->>useVoiceInput: getUserMedia() → MediaRecorder.start()
        User->>VoiceInputButton: stop
        useVoiceInput->>WhisperWebClient: transcribe(blob)
        WhisperWebClient->>WhisperWebClient: blobToWhisperFloat32()
        WhisperWebClient->>Worker: "postMessage({type:transcribe, audio})"
        Worker->>Worker: asrPipeline(audio)
        Worker-->>WhisperWebClient: "postMessage({type:result, text})"
        WhisperWebClient-->>useVoiceInput: text
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    else "engine = whisperServer"
        useVoiceInput->>useVoiceInput: getUserMedia() → MediaRecorder.start()
        User->>VoiceInputButton: stop
        useVoiceInput->>BackendAPI: POST /api/v1/transcribe
        BackendAPI->>OpenAI: POST /v1/audio/transcriptions
        OpenAI-->>BackendAPI: "{text}"
        BackendAPI-->>useVoiceInput: "{text}"
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    end

    VoiceInputButton->>VoiceInputButton: insertText into editor
    opt auto-send enabled
        VoiceInputButton->>VoiceInputButton: rAF → handleVoiceAutoSend()
    end
Loading

Reviews (4): Last reviewed commit: "fix: voice mode progress scale, hold-mod..." | Re-trigger Greptile

Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go
Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go
Comment thread apps/web/hooks/use-voice-input.ts
Comment thread apps/web/hooks/use-voice-input.test.ts
Comment thread apps/web/lib/voice/whisper-web-client.ts
Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
apps/web/lib/voice/whisper-web-models.ts (1)

5-5: ⚡ Quick win

Stale modelId comment contradicts the onnx-community rationale below.

This doc says the Xenova/* mirrors are used, but every entry uses onnx-community/* and lines 13-17 explain Xenova was deliberately avoided (MatMulNBits/WASM failures). Leaving this could prompt a regression back to Xenova.

📝 Align the comment with the actual mirror
-  /** Hugging Face model id (the Xenova/* mirrors are pre-quantized for transformers.js). */
+  /** Hugging Face model id (onnx-community/* mirrors; see note below on why not Xenova/*). */
   modelId: string;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/web/lib/voice/whisper-web-models.ts` at line 5, Update the stale
top-of-file comment that references "Xenova/*" so it correctly describes the
actual mirror used (onnx-community/*) and the rationale for avoiding Xenova;
locate the comment above the model list (the `modelId` doc comment near the
`WHISPER_MODELS` entries) and replace it with a brief note that entries use
`onnx-community/*` and mention the MatMulNBits/WASM failures as the reason
Xenova mirrors were not used to prevent accidental regression.
apps/backend/internal/user/store/sqlite.go (1)

196-207: 💤 Low value

Cross-codebase sync required: voice mode defaults mirrored on frontend.

The comment on line 197 states "Mirrored on the frontend; keep in sync." This creates a maintenance burden where backend default changes must be manually replicated in the frontend code, and vice versa. If they drift, users see inconsistent defaults depending on whether settings were server-hydrated or client-initialized.

Consider one of:

  • Expose defaults via API (e.g., GET /api/v1/user-settings/voice-mode/defaults)
  • Code-gen frontend defaults from backend source
  • Shared constants file (if monorepo structure allows)

For now, document the exact frontend file/location in the comment so future maintainers know where to update both sides.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/backend/internal/user/store/sqlite.go` around lines 196 - 207, Update
the comment above defaultVoiceModeSettings to document the exact frontend file
and symbol that mirror these defaults so maintainers can sync changes; mention
models.VoiceModeSettings and defaultVoiceModeSettings by name and add the
frontend path and exported constant/function name (e.g., frontend file and
constant that must be updated) and a short note to keep them in sync or expose
via API in future.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/backend/internal/user/service/service_test.go`:
- Around line 515-541: The test for partial updates using applyVoiceMode on a
models.UserSettings with only Engine set is missing assertions for the boolean
fields Enabled and AutoSend; because the service implementation unconditionally
assigns these fields they will be zeroed (false) during this partial update. Add
assertions after the applyVoiceMode call checking settings.VoiceMode.Enabled is
false and settings.VoiceMode.AutoSend is false (using t.Fatalf with clear
messages) so the test documents and locks in the current behavior for Enabled
and AutoSend when only Engine is provided.

In `@apps/web/lib/voice/whisper-web-client.ts`:
- Around line 108-115: The worker error handler currently only rejects the
in-flight promise and leaves a dead worker in place; update the "error" listener
on this.worker to terminate and clear the dead worker and reset state so
ensureWorker()/init/transcribe can recreate it: call this.worker.terminate() (if
available), set this.worker = null, set this.ready = false and
this.loadingModelId = null (and any other loading flags used), then reject and
clear this.pending as you already do; keep the rejection but ensure the
worker/state cleanup happens before resolving the pending logic so future calls
to ensureWorker() will create a fresh worker.

---

Nitpick comments:
In `@apps/backend/internal/user/store/sqlite.go`:
- Around line 196-207: Update the comment above defaultVoiceModeSettings to
document the exact frontend file and symbol that mirror these defaults so
maintainers can sync changes; mention models.VoiceModeSettings and
defaultVoiceModeSettings by name and add the frontend path and exported
constant/function name (e.g., frontend file and constant that must be updated)
and a short note to keep them in sync or expose via API in future.

In `@apps/web/lib/voice/whisper-web-models.ts`:
- Line 5: Update the stale top-of-file comment that references "Xenova/*" so it
correctly describes the actual mirror used (onnx-community/*) and the rationale
for avoiding Xenova; locate the comment above the model list (the `modelId` doc
comment near the `WHISPER_MODELS` entries) and replace it with a brief note that
entries use `onnx-community/*` and mention the MatMulNBits/WASM failures as the
reason Xenova mirrors were not used to prevent accidental regression.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f540a2bb-7d64-4c93-89c1-a5b26a929471

📥 Commits

Reviewing files that changed from the base of the PR and between a34ea71 and feb9f35.

⛔ Files ignored due to path filters (1)
  • apps/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (45)
  • apps/backend/cmd/kandev/helpers.go
  • apps/backend/cmd/kandev/main.go
  • apps/backend/internal/common/config/config.go
  • apps/backend/internal/user/controller/controller.go
  • apps/backend/internal/user/dto/dto.go
  • apps/backend/internal/user/models/models.go
  • apps/backend/internal/user/service/service.go
  • apps/backend/internal/user/service/service_test.go
  • apps/backend/internal/user/store/sqlite.go
  • apps/backend/internal/voice/handlers/transcribe_handlers.go
  • apps/backend/internal/voice/handlers/transcribe_handlers_test.go
  • apps/backend/internal/voice/transcribe/service.go
  • apps/backend/internal/voice/transcribe/service_test.go
  • apps/web/app/settings/voice-mode/page.tsx
  • apps/web/components/settings/editors-settings-state.tsx
  • apps/web/components/settings/keyboard-shortcuts-card.tsx
  • apps/web/components/settings/settings-app-sidebar.tsx
  • apps/web/components/settings/voice-mode-settings.tsx
  • apps/web/components/task/chat/chat-input-body.tsx
  • apps/web/components/task/chat/chat-input-container.tsx
  • apps/web/components/task/chat/chat-input-toolbar.tsx
  • apps/web/components/task/chat/voice-input-button.tsx
  • apps/web/hooks/use-user-display-settings.ts
  • apps/web/hooks/use-voice-input.test.ts
  • apps/web/hooks/use-voice-input.ts
  • apps/web/lib/api/domains/settings-api.ts
  • apps/web/lib/api/domains/voice-api.test.ts
  • apps/web/lib/api/domains/voice-api.ts
  • apps/web/lib/keyboard/constants.ts
  • apps/web/lib/keyboard/shortcut-overrides.test.ts
  • apps/web/lib/keyboard/shortcut-overrides.ts
  • apps/web/lib/ssr/user-settings.test.ts
  • apps/web/lib/ssr/user-settings.ts
  • apps/web/lib/state/slices/settings/settings-slice.ts
  • apps/web/lib/state/slices/settings/types.ts
  • apps/web/lib/types/backend.ts
  • apps/web/lib/types/http-voice.ts
  • apps/web/lib/types/http.ts
  • apps/web/lib/voice/capabilities.test.ts
  • apps/web/lib/voice/capabilities.ts
  • apps/web/lib/voice/whisper-web-client.ts
  • apps/web/lib/voice/whisper-web-models.ts
  • apps/web/lib/ws/handlers/users.ts
  • apps/web/package.json
  • apps/web/workers/whisper-web.worker.ts

Comment thread apps/backend/internal/user/service/service_test.go Outdated
Comment thread apps/web/lib/voice/whisper-web-client.ts
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/web/hooks/use-voice-input.ts">

<violation number="1" location="apps/web/hooks/use-voice-input.ts:212">
P2: `cancel()` aborts Web Speech without detaching callbacks, so abort events can still surface as errors/transcripts after the user cancels.</violation>
</file>

<file name="apps/backend/internal/voice/transcribe/service_test.go">

<violation number="1" location="apps/backend/internal/voice/transcribe/service_test.go:58">
P2: Avoid `t.Fatalf` inside the server handler goroutine; use `t.Errorf` and return instead so the test fails safely without invoking `FailNow` from a non-test goroutine.</violation>
</file>

<file name="apps/backend/internal/voice/handlers/transcribe_handlers.go">

<violation number="1" location="apps/backend/internal/voice/handlers/transcribe_handlers.go:42">
P1: `/api/v1/transcribe` is exposed without authentication or throttling, which allows unauthenticated abuse of the server-side OpenAI key.</violation>

<violation number="2" location="apps/backend/internal/voice/handlers/transcribe_handlers.go:59">
P2: Oversized multipart bodies are misclassified as missing-file 400 errors; return 413 when `FormFile` fails due to body size limits.</violation>
</file>

<file name="apps/web/components/settings/voice-mode-settings.tsx">

<violation number="1" location="apps/web/components/settings/voice-mode-settings.tsx:404">
P2: Shortcut persistence also captures `userSettings` in a closure and re-applies it in the async error path, which can roll back unrelated newer changes. Fetch current store state inside the handler/catch instead of writing the captured snapshot.

(Based on your team's feedback about reading live Zustand state in async handlers to avoid stale-closure races.) [FEEDBACK_USED]</violation>
</file>

<file name="apps/backend/internal/voice/transcribe/service.go">

<violation number="1" location="apps/backend/internal/voice/transcribe/service.go:61">
P2: Guard against nil in `WithHTTPClient` to prevent a nil-pointer panic in `Transcribe`.</violation>
</file>

<file name="apps/web/lib/api/domains/voice-api.ts">

<violation number="1" location="apps/web/lib/api/domains/voice-api.ts:28">
P2: `options?.init` is spread after `method`/`body`, so caller-provided init can override the required multipart POST request and break transcription uploads.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic

Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go
Comment thread apps/web/components/settings/voice-mode-settings.tsx Outdated
Comment thread apps/backend/internal/user/models/models.go
Comment thread apps/web/hooks/use-voice-input.ts Outdated
Comment thread apps/backend/internal/voice/transcribe/service_test.go Outdated
Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go
Comment thread apps/web/components/settings/voice-mode-settings.tsx Outdated
Comment thread apps/web/lib/api/domains/voice-api.ts
Comment thread apps/backend/internal/voice/transcribe/service.go
Bot-review fixes from #1159:
- Map http.MaxBytesError from c.FormFile to 413 instead of bare 400.
- Strip BCP-47 region suffix before passing language hint to Whisper Web.
- Recover from worker crash: terminate + null worker + reset ready flag so
  next init/transcribe creates a fresh worker instead of hanging.
- Read live store state via storeApi.getState() in voice-mode settings
  savers to avoid stale-closure clobbering concurrent settings updates.
- Detach Web Speech callbacks before abort() so trailing onend/onerror
  events don't mutate hook state after cancel().
- Spread caller init before method/body in voice-api fetch so a caller-
  supplied init can't override the multipart POST.
- Replace t.Fatalf with t.Errorf+return inside test HTTP handler goroutine
  (FailNow from non-test goroutine is unsafe).
- Add Enabled/AutoSend zeroing assertions in partial-update test so the
  bool-overwrite-on-PATCH behavior is locked in explicitly.

E2E fix:
- toolbar-overflow spec used button.rounded-full which now matches both
  the send and voice buttons (strict-mode violation). Switched to the
  submit-message-button testid for an unambiguous locator.
@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @jcfs's task in 3m 59s —— View job


Code Review — feat: add voice mode for chat input

  • Read skill instructions and gather context
  • Analyze backend changes (voice package, user settings, config)
  • Analyze frontend changes (voice components, hooks, workers)
  • Identify security, correctness, and architecture issues
  • Post findings

Findings

Blocker (must fix before merge)

  1. whisper-web-client.ts has no test fileapps/web/lib/voice/whisper-web-client.ts:194

    • Issue: No whisper-web-client.test.ts exists. The Worker postMessage/onmessage protocol, pending-request queue, concurrent-call rejection, dispose-during-pending teardown, and blobToWhisperFloat32 audio resample pipeline are all untested.
    • Why: This is a non-trivial API-client class with asynchronous state. A broken pending reset on dispose or a wrong ready check in init() would silently degrade the Whisper Web path with no test failure.
    • Fix: Create apps/web/lib/voice/whisper-web-client.test.ts. Mock the Worker constructor to intercept postMessage and fire onmessage events manually. Cover: init resolves on { type: "ready" }, rejects on { type: "error" }, is a no-op for same model; transcribe throws before init; concurrent call rejected; dispose rejects pending and terminates worker. Use /tdd.
    • Fix this →
  2. Capture-path tests missing from use-voice-input.test.tsapps/web/hooks/use-voice-input.test.ts:199

    • Issue: The 199-line test file covers only the Web Speech path. beginCapture, finishCapture, transcribeViaServer, and transcribeViaWhisperWeb have zero coverage. transcribeAudio is already mocked so the server path requires no real backend.
    • Why: The capture paths contain the most branchy logic in the hook — empty blob handling, 503 error mapping, driverRef nulling timing — and a regression in any of them would not be caught.
    • Fix: Add tests for (a) whisperServer engine: set engine, mock MediaRecorder + getUserMedia, call start()stop(), assert transcribeAudio was called and onTranscript received the result; (b) empty blob: assert onTranscript is NOT called; (c) 503 error: assert onError called with code: "not-configured". Use /tdd.
    • Fix this →

Suggestion (recommended, doesn't block)

  1. Stale JSDoc comment references Xenova/*apps/web/lib/voice/whisper-web-models.ts:5
    • Issue: The modelId JSDoc says "the Xenova/* mirrors are pre-quantized for transformers.js" but all entries use onnx-community/* and lines 13-17 explain exactly why Xenova was abandoned (MatMulNBits/WASM crash).
    • Why: A future contributor could revert entries to Xenova/* to "match" the comment, reintroducing the browser crash.
    • Fix: /** Hugging Face model id (onnx-community/* mirrors; Xenova/* avoided — see MatMulNBits note below). */

Previous review issues — status

All issues from the earlier claude and greptile pass are confirmed fixed in commit 5157743:

  • ✅ 413 vs 400 for oversized bodies (errors.As(err, &maxBytesErr) added)
  • ✅ Dead worker not reset on crash (terminate + null + reset added in ensureWorker error handler)
  • ✅ BCP-47 tags passed to Whisper pipeline (resolveWhisperLang now strips region suffix)
  • voice-api.ts spread order (caller init now spread before method/body)
  • ✅ Stale closure in useVoiceModeSaver / useShortcutSaver (storeApi.getState() at call time)
  • t.Fatalf in server handler goroutine (replaced with t.Errorf + return)
  • ✅ Partial update zeroes bool fields — test assertions added
  • cancel() leaving dangling Web Speech callbacks (null assigned before abort())

Summary

Severity Count
Blocker 2
Suggestion 1

Verdict: Blocked — fix the two test-coverage blockers first.

The backend plumbing (transcription service, config, user settings persistence, validation, error hierarchy) is well-structured throughout. The frontend hook, worker, and settings page are clean. The two blockers are purely on test coverage for the WhisperWebClient class and the hook's capture paths — the mocking infrastructure is already in place in the existing test file; these are straightforward additions.

Comment thread apps/web/lib/voice/whisper-web-client.ts
Comment thread apps/web/hooks/use-voice-input.test.ts
Comment thread apps/web/lib/voice/whisper-web-models.ts Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 8 files (changes from recent commits).

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread apps/web/components/settings/voice-mode-settings.tsx Outdated
Comment thread apps/web/lib/voice/whisper-web-client.ts Outdated
Comment thread apps/web/components/settings/voice-mode-settings.tsx Outdated
Comment thread apps/web/lib/voice/whisper-web-client.ts
…keys

Round 2 of PR review feedback on #1159:
- voice-mode settings rollback now restores only the keys patched by the
  failing request (not the whole snapshot), preserving concurrent edits
  to unrelated voiceMode fields.
- Shortcut rollback diffs previous vs next and reverts only the changed
  keys (restore prior value or delete if absent), preserving other
  concurrent shortcut edits.
- Whisper worker error handler captures the worker reference at attach
  time and only clears refs when this.worker still matches it — a stale
  error from a previously replaced worker no longer nukes the active one.
- Update stale whisper-web-models JSDoc that still pointed at Xenova
  mirrors.
@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @jcfs's task in 3m 51s —— View job


Code Review — feat: add voice mode for chat input

  • Read SKILL.md and gather review context
  • Analyze latest commit (283c001) changes
  • Review all changed files for correctness and quality issues
  • Post findings

Findings

Blocker (must fix before merge)

  1. Model download progress tooltip shows "5000%" instead of "50%"apps/web/hooks/use-voice-input.ts:344
    • Issue: p.progress from the Whisper Web worker is 0–100 (transformers.js emits percentages). Line 344 stores it directly in modelLoad.progress, but voice-input-button.tsx:260 renders it as Math.round(modelLoad.progress * 100)% — expecting a 0–1 decimal. A 50% download therefore displays as "5000%".
    • Why: The "ready" state at line 350 already uses the 0–1 convention (progress: 1), so the invariant is established — this path just breaks it. Any user loading a Whisper Web model sees a broken progress tooltip.
    • Fix: divide by 100 at line 344: progress: p.progress / 100

Suggestion (recommended, doesn't block)

  1. Stale example model ID in worker wire protocol commentapps/web/workers/whisper-web.worker.ts:12

    • Issue: The init message example still shows "Xenova/whisper-base", but the actual model IDs are "onnx-community/*" — and lines 14–18 explain Xenova was deliberately abandoned to avoid the MatMulNBits WASM crash.
    • Fix: update to "onnx-community/whisper-base".
  2. Capture-engine test coverageapps/web/hooks/use-voice-input.test.ts:199 and apps/web/lib/voice/whisper-web-client.ts

    • The whisperServer/whisperWeb capture paths and WhisperWebClient's Worker protocol have zero test coverage. Already acknowledged by author as a planned follow-up; tracking here for visibility.

Previous review findings — status

All earlier blockers confirmed addressed as of 5157743 / 283c001:

  • ✅ 413 vs 400 for oversized bodies (errors.As(err, &maxBytesErr) in transcribe_handlers.go:62)
  • ✅ Worker error handler reset to stale-worker-safe identity check (ensureWorker error listener)
  • ✅ BCP-47 → ISO 639-1 strip in resolveWhisperLang (use-voice-input.ts:137)
  • voice-api.ts spread order (...options?.init now before method/body)
  • ✅ Stale closure in useVoiceModeSaver / useShortcutSaver (storeApi.getState() at call time)
  • ✅ Rollback scoped to changed keys only (not whole snapshot)
  • t.Fatalf in server goroutine replaced with t.Errorf + return
  • ✅ Partial update bool assertions added to service_test.go
  • abortDriver nulls Web Speech callbacks before abort()
  • ✅ Stale Xenova JSDoc in whisper-web-models.ts

Summary

Severity Count
Blocker 1
Suggestion 2

Verdict: Blocked — fix the progress scale bug before merge. One-liner fix at use-voice-input.ts:344.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread apps/web/components/settings/voice-mode-settings.tsx
Comment thread apps/web/components/settings/voice-mode-settings.tsx
Comment thread apps/web/hooks/use-voice-input.ts Outdated
Comment thread apps/web/workers/whisper-web.worker.ts Outdated
Comment thread apps/web/hooks/use-voice-input.ts
Round 3 of PR review feedback on #1159:
- Normalise Whisper progress to 0–1 in ensureWhisperClient (transformers.js
  emits 0–100, the button's display + the `ready: 1` convention expect
  0–1 — fixes the 5000% mid-download display).
- Claim driverRef synchronously at the top of finishCapture so concurrent
  pointerup + pointerleave invocations early-return instead of racing and
  clobbering a freshly-started recording's ref.
- Race-aware rollback: voice-mode and shortcut catch branches now skip
  reverting a key when latest[key] !== next[key]. A failed earlier request
  no longer overwrites a later successful save to the same key.
- Update the Worker wire-protocol comment to show onnx-community/whisper-base
  instead of the stale Xenova/* example.
@claude
Copy link
Copy Markdown

claude Bot commented May 29, 2026

Claude finished @jcfs's task in 3m 54s —— View job


Code Review — feat: add voice mode for chat input

  • Read skill instructions and gather context
  • Analyze latest commit (ffe88a6) changes
  • Review all changed files for new issues
  • Post findings

Findings

Previous blockers — all resolved in ffe88a6

All issues raised in the prior review passes are confirmed fixed:

  • ✅ Progress scale: p.progress / 100 at use-voice-input.ts:353 — 5000% display gone
  • ✅ Hold-mode double-stop race: driverRef.current = null now claimed synchronously before the first await in finishCapture (use-voice-input.ts:314)
  • ✅ Race-aware rollback: voice-mode and shortcut catch branches skip reverting a key if latest[key] !== next[key] (voice-mode-settings.tsx:108, voice-mode-settings.tsx:439)
  • ✅ Worker wire-protocol comment updated to onnx-community/whisper-base
  • ✅ Whisper models JSDoc updated (whisper-web-models.ts:5)

Deferred (acknowledged by author, will be tracked in follow-up)

  1. whisper-web-client.ts has no test fileapps/web/lib/voice/whisper-web-client.ts

    • The Worker postMessage/onmessage protocol, pending-request queue, concurrent-call rejection, dispose-during-pending teardown, and blobToWhisperFloat32 audio resample pipeline are untested.
    • Per project rules this is a blocker. Author has explicitly committed to opening a follow-up issue.
  2. Capture-path tests missingapps/web/hooks/use-voice-input.test.ts

    • whisperServer and whisperWeb engine paths (beginCapture, finishCapture, transcribeViaServer, transcribeViaWhisperWeb) have zero coverage. The mock infrastructure (transcribeAudio via vi.mock, fake MediaRecorder, fake getUserMedia) is already partially established in the file.
    • Per project rules this is a blocker. Author has explicitly committed to opening a follow-up issue.

Summary

Severity Count
Deferred blocker 2

Verdict: All runtime bugs are fixed. The only open items are the two test-coverage gaps the author has explicitly deferred to a follow-up issue. The backend package (transcription service, config, user settings persistence, error hierarchy) is solid throughout. The frontend hook, settings page, worker, and rollback logic are now in good shape after the four rounds of fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant