feat: add voice mode for chat input by jcfs · Pull Request #1159 · kdlbs/kandev

jcfs · 2026-05-29T16:24:04Z

Adds voice input to the chat composer with three transcription engines (in-browser Web Speech, in-browser Whisper via transformers.js, server-side OpenAI Whisper fallback) and a /settings/voice-mode page so users can pick engine, language, activation mode, auto-send, model size, and shortcut — or turn the whole feature off.

Important Changes

New internal/voice backend package with a POST /api/v1/transcribe endpoint behind KANDEV_VOICE_OPENAI_API_KEY (returns 503 when unset; frontend gracefully degrades).
User settings gain a nested voice_mode JSON object — no schema migration; old rows default enabled to true via a *bool storage adapter.
Whisper Web runs in a Web Worker using @huggingface/transformers with fp16 decoder weights pulled from onnx-community/whisper-* (sidesteps the MatMulNBits scale-missing crash that q4/q8 hit on WASM).
Voice setting changes broadcast over the existing user.settings.updated WS event so toggling on one tab updates every open chat tab live.

Validation

make -C apps/backend test lint — all packages pass, 0 lint issues.
cd apps && pnpm --filter @kandev/web typecheck lint test — green, 2517 tests pass.
New tests: voice/transcribe, voice/handlers, useVoiceInput, parseVoiceMode, detectVoiceCapabilities, resolveActiveEngine, transcribeAudio, applyVoiceMode.
Manual smoke: localhost desktop (Web Speech path), mobile Safari over Cloudflare tunnel (Whisper Web path with fp16 model download).

Possible Improvements

Low risk for non-voice flows (additive). The Whisper Web first-load is ~75 MB on Base — would be smaller if we ever wire WebGPU + q4 detection. Server-side Whisper currently only reads a global KANDEV_VOICE_OPENAI_API_KEY; a per-user/agent credential lookup is a natural follow-up but explicitly out of scope here.

Checklist

I have performed a self-review of my code.
I have manually tested my changes and they work as expected.
My changes have tests that cover the new functionality and edge cases.
If my change touches UI files (apps/web/), I have added or updated Playwright e2e tests in apps/web/e2e/ and verified them with make test-e2e.

Preview Environment


URL	https://kandev-pr-1159-bwo7.sprites.app
Commit	`ffe88a6`
Agent	Mock agent

Updates automatically on each push. Destroyed when the PR is closed.

Adds a voice input button to the chat composer with three transcription engines: in-browser Web Speech API, in-browser Whisper via @huggingface/transformers, and a server-side OpenAI Whisper fallback gated on KANDEV_VOICE_OPENAI_API_KEY. User-configurable from a new /settings/voice-mode page: - Master on/off toggle (defaults to on; opt-out, not opt-in) - Engine preference (auto / Web Speech / Whisper Web / Whisper Server) - Language hint (auto-detect or BCP-47 pinned) - Activation mode (click-to-toggle or hold-to-talk) - Auto-send after transcription - Whisper Web model size (tiny / base / small) - Editable Cmd+Shift+M keyboard shortcut Voice settings live under user_settings.voice_mode as a nested JSON object — no schema migration. Setting changes broadcast via the existing user.settings.updated WS event so any open chat tab updates live. The mic button is rendered on the right of the composer alongside the send button (round primary fill; destructive fill with pulse ring while recording). On browsers without a usable engine (e.g. iOS Chrome over HTTP) the button stays visible as a muted greyed icon — tapping it surfaces a toast with the reason (typically a missing secure context). Whisper Web runs in a Web Worker with fp16 decoder weights (avoids the MatMulNBits scale-missing crash that q4/q8 hit on WASM). Models are pulled from the onnx-community/* mirror and cached by transformers.js in the Cache Storage on first use.

coderabbitai · 2026-05-29T16:24:20Z

📝 Walkthrough

Walkthrough

Adds full-stack voice input: server-side transcription fallback (OpenAI Whisper), client engines (Web Speech, Whisper Web, Whisper Server), user settings UI and persistence, chat composer integration with transcript insertion and optional auto-send, keyboard shortcut, worker for in-browser Whisper, and tests across backend and frontend.

Changes

Voice Input & Transcription Feature

Layer / File(s)	Summary
Backend Configuration & Voice Settings Models `apps/backend/internal/common/config/config.go`, `apps/backend/internal/user/models/models.go`, `apps/backend/internal/user/dto/dto.go`	`Config` gains `Voice` section with OpenAI API key; `UserSettings` adds `VoiceMode` with engine, language, activation mode, auto-send, and Whisper model preferences; DTO updated to serialize/deserialize voice settings.
Backend Transcription Service & Handler `apps/backend/internal/voice/transcribe/service.go`, `apps/backend/internal/voice/handlers/transcribe_handlers.go`, `apps/backend/internal/voice/transcribe/service_test.go`, `apps/backend/internal/voice/handlers/transcribe_handlers_test.go`	New `transcribe.Service` calls OpenAI Whisper endpoint with multipart audio, handles `ErrNotConfigured`, returns `UpstreamError` for non-2xx responses. HTTP handler validates configuration, enforces size limits, parses multipart audio, and returns `503`/`502`/`500` based on error type. Full test coverage for success and error paths.
Backend User Settings Orchestration & Persistence `apps/backend/internal/user/service/service.go`, `apps/backend/internal/user/controller/controller.go`, `apps/backend/internal/user/store/sqlite.go`, `apps/backend/cmd/kandev/...`	Service validates voice mode update fields (engine, mode, model), normalizes language, preserves existing values for partial updates, publishes event with `voice_mode`. Controller forwards voice mode from request. SQLite layer persists/hydrates `voice_mode` JSON with defaults for legacy rows. Configuration wiring passes OpenAI key through to handler.
Frontend Type System & Voice API Client `apps/web/lib/types/http-voice.ts`, `apps/web/lib/types/http.ts`, `apps/web/lib/types/backend.ts`, `apps/web/lib/api/domains/settings-api.ts`, `apps/web/lib/api/domains/voice-api.ts`	TypeScript types for voice engines (`webSpeech`/`whisperWeb`/`whisperServer`/`auto`), activation modes, and Whisper model sizes. `transcribeAudio` API client posts multipart FormData to backend, extracts error messages, throws `ApiError` on failure. User settings update payload now includes optional `voice_mode` field.
Frontend Voice Capture & Processing Engine `apps/web/hooks/use-voice-input.ts`, `apps/web/hooks/use-voice-input.test.ts`, `apps/web/lib/voice/capabilities.ts`, `apps/web/lib/voice/capabilities.test.ts`, `apps/web/lib/voice/whisper-web-client.ts`, `apps/web/lib/voice/whisper-web-models.ts`, `apps/web/workers/whisper-web.worker.ts`	Core `useVoiceInput` hook supports Web Speech API and audio capture with server/in-browser transcription; manages state transitions (idle → requesting → recording → processing), error mapping, and model-load progress. Capability detection identifies available engines with SSR safety. `WhisperWebClient` wraps worker communication with single-request concurrency and progress tracking. Web Worker loads Hugging Face Transformers ASR pipeline, handles init/transcribe/dispose messages.
Frontend Voice Settings Page & Configuration UI `apps/web/app/settings/voice-mode/page.tsx`, `apps/web/components/settings/voice-mode-settings.tsx`, `apps/web/components/settings/settings-app-sidebar.tsx`, `apps/web/components/settings/keyboard-shortcuts-card.tsx`	New voice mode settings page fetches initial state from server, renders enable toggle, engine card with capability-aware options, language/mode/auto-send selections, Whisper model picker, availability banner for unsupported contexts, and keyboard shortcut recorder. Sidebar refactored with reusable `SimpleSidebarRows` helper; adds "Voice Mode" entry under microphone icon. `ShortcutRecorder` exported for reuse.
Frontend Chat Input Voice Integration `apps/web/components/task/chat/voice-input-button.tsx`, `apps/web/components/task/chat/chat-input-container.tsx`, `apps/web/components/task/chat/chat-input-body.tsx`, `apps/web/components/task/chat/chat-input-toolbar.tsx`	New `VoiceInputButton` returns null when disabled, renders unsupported fallback for non-HTTPS contexts, otherwise shows enabled button wired to `useVoiceInput` hook with hold/toggle mode support, error toasts, recording indicator, and keyboard shortcut toggling. Chat input container inserts transcribed text at cursor with smart spacing, optionally auto-sends. Editor area and toolbar props extended to accept voice callbacks.
Frontend State Management & SSR Integration `apps/web/lib/state/slices/settings/types.ts`, `apps/web/lib/state/slices/settings/settings-slice.ts`, `apps/web/lib/ssr/user-settings.ts`, `apps/web/lib/ssr/user-settings.test.ts`, `apps/web/lib/ws/handlers/users.ts`, `apps/web/lib/keyboard/constants.ts`, `apps/web/lib/keyboard/shortcut-overrides.ts`, `apps/web/hooks/use-user-display-settings.ts`, `apps/web/package.json`	Settings state slice initializes `voiceMode` from `DEFAULT_VOICE_MODE_STATE`. SSR parser converts backend snake_case to camelCase with defaults. WebSocket handler updates voice mode on `user.settings.updated` event. New `VOICE_INPUT_TOGGLE` keyboard shortcut (Ctrl/Cmd+Shift+M) added to configurable list. Carry-forward logic defaults voice mode when missing. Added `@huggingface/transformers` dependency.

Sequence Diagrams

sequenceDiagram
  participant User
  participant Chat as Chat Composer
  participant Button as Voice Button
  participant Hook as useVoiceInput
  participant WebSpeech as Web Speech API
  participant Server as Backend /api/v1/transcribe
  participant OpenAI as OpenAI Whisper

  User->>Chat: press voice shortcut or click voice button
  Chat->>Button: trigger onVoiceTranscript/onAutoSend
  Button->>Hook: start()
  Hook->>WebSpeech: SpeechRecognition.start() or beginCapture()
  WebSpeech-->>Hook: onresult with transcript
  Hook->>Chat: onTranscript("hello world")
  Chat->>Chat: insert at cursor
  Hook->>Server: optional transcribe via server (capture path)
  Server->>OpenAI: multipart upload
  OpenAI-->>Server: {text: ...}
  Server-->>Hook: {text: ...}
  Hook-->>Chat: onTranscript("...from server")

sequenceDiagram
  participant Browser
  participant Worker as Whisper Web Worker
  participant HF as Hugging Face CDN

  Browser->>Worker: {type: 'init', model: 'base'}
  Worker->>HF: fetch Transformers model files
  HF-->>Worker: model bytes, emit progress
  Worker-->>Browser: {type: 'ready'}
  Browser->>Worker: {type: 'transcribe', audio: Float32Array}
  Worker->>Worker: pipeline.transcribe(audio)
  Worker-->>Browser: {type: 'result', text: '...'}

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested Labels

codex

Suggested Reviewers

zeval

🐰 A whisper, a click, a captured thought,
From voices bright to code well-wrought,
The microphone now hears your call,
While toggles dance and settings sprawl. 🎙️✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/implement-voice-mode-cv0

claude · 2026-05-29T16:24:27Z

Claude finished @jcfs's task in 7m 46s —— View job

Code Review — feat: add voice mode for chat input

Read skill instructions and gather context
Analyze backend changes (voice package, user settings, config)
Analyze frontend changes (voice components, hooks, workers)
Identify security, correctness, and architecture issues
Post findings

Findings

Blocker (must fix before merge)

Capture engine paths untested — apps/web/hooks/use-voice-input.test.ts:199
- Issue: The test file only exercises the Web Speech path. beginCapture, finishCapture, stopCapture, transcribeViaServer, and transcribeViaWhisperWeb have zero test coverage. The transcribeAudio mock is already wired but never triggered.
- Why: The whisperServer and whisperWeb paths contain the most branchy logic in the hook — empty blob handling, 503 error mapping, driverRef nulling timing — none of which is exercised.
- Fix: Add tests for whisperServer engine (mock MediaRecorder + getUserMedia, verify transcribeAudio is called and transcript is emitted); empty blob (no onTranscript call); and 503 error path (emits code: "not-configured"). Fix this →
WhisperWebClient has no test file — apps/web/lib/voice/whisper-web-client.ts
- Issue: The Worker messaging protocol, pending-request queue, and blobToWhisperFloat32 audio-resample function are all untested. There is no whisper-web-client.test.ts.
- Why: This is an API-client-class with non-trivial logic: it manages a single-inflight-request queue, maps Worker postMessage/onmessage to Promises, and handles dispose-during-pending. A broken pending reset on dispose or a wrong ready check in init() would silently degrade the Whisper Web path.
- Fix: Create apps/web/lib/voice/whisper-web-client.test.ts. Mock the Worker constructor to intercept postMessage and fire onmessage events manually. Cover: init resolves on { type: "ready" }, rejects on { type: "error" }, is a no-op when model already loaded; transcribe rejects before init; concurrent call rejected; dispose rejects pending. Fix this →

Suggestion (recommended, doesn't block)

Misleading 400 when body exceeds 10 MB — apps/backend/internal/voice/handlers/transcribe_handlers.go:57-61
- Issue: When MaxBytesReader fires during multipart parsing, c.FormFile("audio") returns an error that is mapped to 400 "audio file is required". The correct status for an oversized payload is 413 StatusRequestEntityTooLarge.
- Fix: Check errors.As(err, new(*http.MaxBytesError)) before the generic 400 and return 413 instead. See inline comment for the exact snippet. Fix this →

Summary

Severity	Count
Blocker	2
Suggestion	1

Verdict: Blocked — fix blockers first.

The backend work (transcription service, config, user settings persistence, validation) is well-structured: the *bool storage adapter for enabled backward-compat, the mergeVoiceModeDefaults approach, the MaxBytesReader cap, and the error type hierarchy are all solid. The two blockers are purely on test coverage for the frontend capture paths — use /tdd to fill those gaps before merge.

greptile-apps · 2026-05-29T16:30:34Z

Greptile Summary

Adds a three-engine voice input feature to the chat composer (Web Speech API, in-browser Whisper via @huggingface/transformers, and a server-side OpenAI Whisper fallback) along with a /settings/voice-mode page for user configuration. The implementation is additive with careful degradation: the server endpoint returns 503 when unconfigured, old user rows default enabled to true via a *bool storage adapter, and voice settings broadcast live over the existing user.settings.updated WebSocket event.

Backend: New internal/voice package with POST /api/v1/transcribe (unauthenticated, 10 MB cap, key-guarded 503), plus VoiceModeSettings wired into the user settings store/service with full validation.
Frontend: useVoiceInput hook abstracts all three engines behind a unified start/stop/cancel API; WhisperWebClient proxies the transformers.js Web Worker; VoiceInputButton handles toggle/hold activation, keyboard shortcut, auto-send, and browser-capability fallback.

Confidence Score: 5/5

Safe to merge. The change is purely additive — all new paths are gated behind the voice_mode.enabled user setting, which defaults to true but can be toggled off per user. Non-voice flows are unaffected.

The backend is well-guarded (503 when unconfigured, 10 MB cap, enum validation on all voice settings fields), the frontend gracefully degrades across all three engine paths, and the settings broadcast correctly propagates changes across tabs. Only cosmetic and documentation-level concerns were found.

No files require special attention. The pnpm lock file is worth a second look for the dev-snapshot ONNX Runtime packages, but this is a transitive dependency from @huggingface/transformers and doesn't affect correctness.

Important Files Changed

Filename	Overview
apps/backend/internal/voice/handlers/transcribe_handlers.go	New HTTP handler for POST /api/v1/transcribe; size limiting via MaxBytesReader with correct 413 handling, 503 short-circuit when unconfigured.
apps/backend/internal/voice/transcribe/service.go	OpenAI Whisper proxy; well-typed error handling (ErrNotConfigured, UpstreamError), MIME→extension mapping, and functional-options pattern for testability.
apps/backend/internal/user/store/sqlite.go	Adds voice_mode persistence using storedVoiceMode (*bool for enabled) to correctly handle pre-existing rows that predate the column.
apps/web/hooks/use-voice-input.ts	Core voice hook; driver abstraction (webSpeech / capture) is clean. whisperErrorMessage always returns code "model-load" for any Whisper error, including runtime transcription failures.
apps/web/lib/voice/whisper-web-client.ts	Robust Worker client with single-inflight guard, stale-worker error protection, and clean audio resample path.
apps/web/workers/whisper-web.worker.ts	transformers.js worker using fp16 decoder dtype to avoid MatMulNBits crash on WASM; model reuse guard prevents redundant downloads.
apps/web/lib/voice/whisper-web-models.ts	Model ID registry for onnx-community mirrors; contains a stale comment saying "q8 variant" while the worker actually uses fp16.
apps/web/components/task/chat/voice-input-button.tsx	Toggle/hold activation, keyboard shortcut binding, auto-send via rAF, and accessibility labels are all well-implemented.
apps/web/components/settings/voice-mode-settings.tsx	Settings page with optimistic updates, key-scoped rollback on failure, and live capability detection; correctly uses storeApi to avoid stale closures.
apps/pnpm-lock.yaml	Adds @huggingface/transformers@4.2.0 which pulls dev-build snapshots of onnxruntime-web and onnxruntime-common; also surfaces a CWE-502 deprecation on @ungap/structured-clone@1.3.0.

Sequence Diagram

sequenceDiagram
    participant User
    participant VoiceInputButton
    participant useVoiceInput
    participant WebSpeechAPI
    participant WhisperWebClient
    participant Worker as whisper-web.worker
    participant BackendAPI as /api/v1/transcribe
    participant OpenAI as OpenAI Whisper

    User->>VoiceInputButton: click / hold / shortcut
    VoiceInputButton->>useVoiceInput: start()
    useVoiceInput->>useVoiceInput: resolveActiveEngine()

    alt "engine = webSpeech"
        useVoiceInput->>WebSpeechAPI: recognition.start()
        WebSpeechAPI-->>useVoiceInput: onresult (transcript)
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    else "engine = whisperWeb"
        useVoiceInput->>useVoiceInput: getUserMedia() → MediaRecorder.start()
        User->>VoiceInputButton: stop
        useVoiceInput->>WhisperWebClient: transcribe(blob)
        WhisperWebClient->>WhisperWebClient: blobToWhisperFloat32()
        WhisperWebClient->>Worker: "postMessage({type:transcribe, audio})"
        Worker->>Worker: asrPipeline(audio)
        Worker-->>WhisperWebClient: "postMessage({type:result, text})"
        WhisperWebClient-->>useVoiceInput: text
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    else "engine = whisperServer"
        useVoiceInput->>useVoiceInput: getUserMedia() → MediaRecorder.start()
        User->>VoiceInputButton: stop
        useVoiceInput->>BackendAPI: POST /api/v1/transcribe
        BackendAPI->>OpenAI: POST /v1/audio/transcriptions
        OpenAI-->>BackendAPI: "{text}"
        BackendAPI-->>useVoiceInput: "{text}"
        useVoiceInput-->>VoiceInputButton: onTranscript(text)
    end

    VoiceInputButton->>VoiceInputButton: insertText into editor
    opt auto-send enabled
        VoiceInputButton->>VoiceInputButton: rAF → handleVoiceAutoSend()
    end

_{Reviews (4): Last reviewed commit: "fix: voice mode progress scale, hold-mod..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

apps/web/lib/voice/whisper-web-models.ts (1)
5-5: ⚡ Quick win

Stale modelId comment contradicts the onnx-community rationale below.

This doc says the Xenova/* mirrors are used, but every entry uses onnx-community/* and lines 13-17 explain Xenova was deliberately avoided (MatMulNBits/WASM failures). Leaving this could prompt a regression back to Xenova.
📝 Align the comment with the actual mirror
-  /** Hugging Face model id (the Xenova/* mirrors are pre-quantized for transformers.js). */
+  /** Hugging Face model id (onnx-community/* mirrors; see note below on why not Xenova/*). */
   modelId: string;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/web/lib/voice/whisper-web-models.ts` at line 5, Update the stale
top-of-file comment that references "Xenova/*" so it correctly describes the
actual mirror used (onnx-community/*) and the rationale for avoiding Xenova;
locate the comment above the model list (the `modelId` doc comment near the
`WHISPER_MODELS` entries) and replace it with a brief note that entries use
`onnx-community/*` and mention the MatMulNBits/WASM failures as the reason
Xenova mirrors were not used to prevent accidental regression.
apps/backend/internal/user/store/sqlite.go (1)
196-207: 💤 Low value

Cross-codebase sync required: voice mode defaults mirrored on frontend.

The comment on line 197 states "Mirrored on the frontend; keep in sync." This creates a maintenance burden where backend default changes must be manually replicated in the frontend code, and vice versa. If they drift, users see inconsistent defaults depending on whether settings were server-hydrated or client-initialized.

Consider one of:

Expose defaults via API (e.g., GET /api/v1/user-settings/voice-mode/defaults)

Code-gen frontend defaults from backend source

Shared constants file (if monorepo structure allows)

For now, document the exact frontend file/location in the comment so future maintainers know where to update both sides.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/backend/internal/user/store/sqlite.go` around lines 196 - 207, Update
the comment above defaultVoiceModeSettings to document the exact frontend file
and symbol that mirror these defaults so maintainers can sync changes; mention
models.VoiceModeSettings and defaultVoiceModeSettings by name and add the
frontend path and exported constant/function name (e.g., frontend file and
constant that must be updated) and a short note to keep them in sync or expose
via API in future.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/backend/internal/user/service/service_test.go`:
- Around line 515-541: The test for partial updates using applyVoiceMode on a
models.UserSettings with only Engine set is missing assertions for the boolean
fields Enabled and AutoSend; because the service implementation unconditionally
assigns these fields they will be zeroed (false) during this partial update. Add
assertions after the applyVoiceMode call checking settings.VoiceMode.Enabled is
false and settings.VoiceMode.AutoSend is false (using t.Fatalf with clear
messages) so the test documents and locks in the current behavior for Enabled
and AutoSend when only Engine is provided.

In `@apps/web/lib/voice/whisper-web-client.ts`:
- Around line 108-115: The worker error handler currently only rejects the
in-flight promise and leaves a dead worker in place; update the "error" listener
on this.worker to terminate and clear the dead worker and reset state so
ensureWorker()/init/transcribe can recreate it: call this.worker.terminate() (if
available), set this.worker = null, set this.ready = false and
this.loadingModelId = null (and any other loading flags used), then reject and
clear this.pending as you already do; keep the rejection but ensure the
worker/state cleanup happens before resolving the pending logic so future calls
to ensureWorker() will create a fresh worker.

---

Nitpick comments:
In `@apps/backend/internal/user/store/sqlite.go`:
- Around line 196-207: Update the comment above defaultVoiceModeSettings to
document the exact frontend file and symbol that mirror these defaults so
maintainers can sync changes; mention models.VoiceModeSettings and
defaultVoiceModeSettings by name and add the frontend path and exported
constant/function name (e.g., frontend file and constant that must be updated)
and a short note to keep them in sync or expose via API in future.

In `@apps/web/lib/voice/whisper-web-models.ts`:
- Line 5: Update the stale top-of-file comment that references "Xenova/*" so it
correctly describes the actual mirror used (onnx-community/*) and the rationale
for avoiding Xenova; locate the comment above the model list (the `modelId` doc
comment near the `WHISPER_MODELS` entries) and replace it with a brief note that
entries use `onnx-community/*` and mention the MatMulNBits/WASM failures as the
reason Xenova mirrors were not used to prevent accidental regression.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f540a2bb-7d64-4c93-89c1-a5b26a929471

📥 Commits

Reviewing files that changed from the base of the PR and between a34ea71 and feb9f35.

⛔ Files ignored due to path filters (1)

apps/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (45)

apps/backend/cmd/kandev/helpers.go
apps/backend/cmd/kandev/main.go
apps/backend/internal/common/config/config.go
apps/backend/internal/user/controller/controller.go
apps/backend/internal/user/dto/dto.go
apps/backend/internal/user/models/models.go
apps/backend/internal/user/service/service.go
apps/backend/internal/user/service/service_test.go
apps/backend/internal/user/store/sqlite.go
apps/backend/internal/voice/handlers/transcribe_handlers.go
apps/backend/internal/voice/handlers/transcribe_handlers_test.go
apps/backend/internal/voice/transcribe/service.go
apps/backend/internal/voice/transcribe/service_test.go
apps/web/app/settings/voice-mode/page.tsx
apps/web/components/settings/editors-settings-state.tsx
apps/web/components/settings/keyboard-shortcuts-card.tsx
apps/web/components/settings/settings-app-sidebar.tsx
apps/web/components/settings/voice-mode-settings.tsx
apps/web/components/task/chat/chat-input-body.tsx
apps/web/components/task/chat/chat-input-container.tsx
apps/web/components/task/chat/chat-input-toolbar.tsx
apps/web/components/task/chat/voice-input-button.tsx
apps/web/hooks/use-user-display-settings.ts
apps/web/hooks/use-voice-input.test.ts
apps/web/hooks/use-voice-input.ts
apps/web/lib/api/domains/settings-api.ts
apps/web/lib/api/domains/voice-api.test.ts
apps/web/lib/api/domains/voice-api.ts
apps/web/lib/keyboard/constants.ts
apps/web/lib/keyboard/shortcut-overrides.test.ts
apps/web/lib/keyboard/shortcut-overrides.ts
apps/web/lib/ssr/user-settings.test.ts
apps/web/lib/ssr/user-settings.ts
apps/web/lib/state/slices/settings/settings-slice.ts
apps/web/lib/state/slices/settings/types.ts
apps/web/lib/types/backend.ts
apps/web/lib/types/http-voice.ts
apps/web/lib/types/http.ts
apps/web/lib/voice/capabilities.test.ts
apps/web/lib/voice/capabilities.ts
apps/web/lib/voice/whisper-web-client.ts
apps/web/lib/voice/whisper-web-models.ts
apps/web/lib/ws/handlers/users.ts
apps/web/package.json
apps/web/workers/whisper-web.worker.ts

cubic-dev-ai

9 issues found

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/web/hooks/use-voice-input.ts">

<violation number="1" location="apps/web/hooks/use-voice-input.ts:212">
P2: `cancel()` aborts Web Speech without detaching callbacks, so abort events can still surface as errors/transcripts after the user cancels.</violation>
</file>

<file name="apps/backend/internal/voice/transcribe/service_test.go">

<violation number="1" location="apps/backend/internal/voice/transcribe/service_test.go:58">
P2: Avoid `t.Fatalf` inside the server handler goroutine; use `t.Errorf` and return instead so the test fails safely without invoking `FailNow` from a non-test goroutine.</violation>
</file>

<file name="apps/backend/internal/voice/handlers/transcribe_handlers.go">

<violation number="1" location="apps/backend/internal/voice/handlers/transcribe_handlers.go:42">
P1: `/api/v1/transcribe` is exposed without authentication or throttling, which allows unauthenticated abuse of the server-side OpenAI key.</violation>

<violation number="2" location="apps/backend/internal/voice/handlers/transcribe_handlers.go:59">
P2: Oversized multipart bodies are misclassified as missing-file 400 errors; return 413 when `FormFile` fails due to body size limits.</violation>
</file>

<file name="apps/web/components/settings/voice-mode-settings.tsx">

<violation number="1" location="apps/web/components/settings/voice-mode-settings.tsx:404">
P2: Shortcut persistence also captures `userSettings` in a closure and re-applies it in the async error path, which can roll back unrelated newer changes. Fetch current store state inside the handler/catch instead of writing the captured snapshot.

(Based on your team's feedback about reading live Zustand state in async handlers to avoid stale-closure races.) [FEEDBACK_USED]</violation>
</file>

<file name="apps/backend/internal/voice/transcribe/service.go">

<violation number="1" location="apps/backend/internal/voice/transcribe/service.go:61">
P2: Guard against nil in `WithHTTPClient` to prevent a nil-pointer panic in `Transcribe`.</violation>
</file>

<file name="apps/web/lib/api/domains/voice-api.ts">

<violation number="1" location="apps/web/lib/api/domains/voice-api.ts:28">
P2: `options?.init` is spread after `method`/`body`, so caller-provided init can override the required multipart POST request and break transcription uploads.</violation>
</file>

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Re-trigger cubic}

Bot-review fixes from #1159: - Map http.MaxBytesError from c.FormFile to 413 instead of bare 400. - Strip BCP-47 region suffix before passing language hint to Whisper Web. - Recover from worker crash: terminate + null worker + reset ready flag so next init/transcribe creates a fresh worker instead of hanging. - Read live store state via storeApi.getState() in voice-mode settings savers to avoid stale-closure clobbering concurrent settings updates. - Detach Web Speech callbacks before abort() so trailing onend/onerror events don't mutate hook state after cancel(). - Spread caller init before method/body in voice-api fetch so a caller- supplied init can't override the multipart POST. - Replace t.Fatalf with t.Errorf+return inside test HTTP handler goroutine (FailNow from non-test goroutine is unsafe). - Add Enabled/AutoSend zeroing assertions in partial-update test so the bool-overwrite-on-PATCH behavior is locked in explicitly. E2E fix: - toolbar-overflow spec used button.rounded-full which now matches both the send and voice buttons (strict-mode violation). Switched to the submit-message-button testid for an unambiguous locator.

claude · 2026-05-29T16:58:39Z

Claude finished @jcfs's task in 3m 59s —— View job

Code Review — feat: add voice mode for chat input

Read skill instructions and gather context
Analyze backend changes (voice package, user settings, config)
Analyze frontend changes (voice components, hooks, workers)
Identify security, correctness, and architecture issues
Post findings

Findings

Blocker (must fix before merge)

whisper-web-client.ts has no test file — apps/web/lib/voice/whisper-web-client.ts:194
- Issue: No whisper-web-client.test.ts exists. The Worker postMessage/onmessage protocol, pending-request queue, concurrent-call rejection, dispose-during-pending teardown, and blobToWhisperFloat32 audio resample pipeline are all untested.
- Why: This is a non-trivial API-client class with asynchronous state. A broken pending reset on dispose or a wrong ready check in init() would silently degrade the Whisper Web path with no test failure.
- Fix: Create apps/web/lib/voice/whisper-web-client.test.ts. Mock the Worker constructor to intercept postMessage and fire onmessage events manually. Cover: init resolves on { type: "ready" }, rejects on { type: "error" }, is a no-op for same model; transcribe throws before init; concurrent call rejected; dispose rejects pending and terminates worker. Use /tdd.
- Fix this →
Capture-path tests missing from use-voice-input.test.ts — apps/web/hooks/use-voice-input.test.ts:199
- Issue: The 199-line test file covers only the Web Speech path. beginCapture, finishCapture, transcribeViaServer, and transcribeViaWhisperWeb have zero coverage. transcribeAudio is already mocked so the server path requires no real backend.
- Why: The capture paths contain the most branchy logic in the hook — empty blob handling, 503 error mapping, driverRef nulling timing — and a regression in any of them would not be caught.
- Fix: Add tests for (a) whisperServer engine: set engine, mock MediaRecorder + getUserMedia, call start() → stop(), assert transcribeAudio was called and onTranscript received the result; (b) empty blob: assert onTranscript is NOT called; (c) 503 error: assert onError called with code: "not-configured". Use /tdd.
- Fix this →

Suggestion (recommended, doesn't block)

Stale JSDoc comment references Xenova/* — apps/web/lib/voice/whisper-web-models.ts:5
- Issue: The modelId JSDoc says "the Xenova/* mirrors are pre-quantized for transformers.js" but all entries use onnx-community/* and lines 13-17 explain exactly why Xenova was abandoned (MatMulNBits/WASM crash).
- Why: A future contributor could revert entries to Xenova/* to "match" the comment, reintroducing the browser crash.
- Fix: /** Hugging Face model id (onnx-community/* mirrors; Xenova/* avoided — see MatMulNBits note below). */

Previous review issues — status

All issues from the earlier claude and greptile pass are confirmed fixed in commit 5157743:

✅ 413 vs 400 for oversized bodies (errors.As(err, &maxBytesErr) added)
✅ Dead worker not reset on crash (terminate + null + reset added in ensureWorker error handler)
✅ BCP-47 tags passed to Whisper pipeline (resolveWhisperLang now strips region suffix)
✅ voice-api.ts spread order (caller init now spread before method/body)
✅ Stale closure in useVoiceModeSaver / useShortcutSaver (storeApi.getState() at call time)
✅ t.Fatalf in server handler goroutine (replaced with t.Errorf + return)
✅ Partial update zeroes bool fields — test assertions added
✅ cancel() leaving dangling Web Speech callbacks (null assigned before abort())

Summary

Severity	Count
Blocker	2
Suggestion	1

Verdict: Blocked — fix the two test-coverage blockers first.

The backend plumbing (transcription service, config, user settings persistence, validation, error hierarchy) is well-structured throughout. The frontend hook, worker, and settings page are clean. The two blockers are purely on test coverage for the WhisperWebClient class and the hook's capture paths — the mocking infrastructure is already in place in the existing test file; these are straightforward additions.

cubic-dev-ai

4 issues found across 8 files (changes from recent commits).

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

…keys Round 2 of PR review feedback on #1159: - voice-mode settings rollback now restores only the keys patched by the failing request (not the whole snapshot), preserving concurrent edits to unrelated voiceMode fields. - Shortcut rollback diffs previous vs next and reverts only the changed keys (restore prior value or delete if absent), preserving other concurrent shortcut edits. - Whisper worker error handler captures the worker reference at attach time and only clears refs when this.worker still matches it — a stale error from a previously replaced worker no longer nukes the active one. - Update stale whisper-web-models JSDoc that still pointed at Xenova mirrors.

claude · 2026-05-29T17:25:41Z

Claude finished @jcfs's task in 3m 51s —— View job

Code Review — feat: add voice mode for chat input

Read SKILL.md and gather review context
Analyze latest commit (283c001) changes
Review all changed files for correctness and quality issues
Post findings

Findings

Blocker (must fix before merge)

Model download progress tooltip shows "5000%" instead of "50%" — apps/web/hooks/use-voice-input.ts:344
- Issue: p.progress from the Whisper Web worker is 0–100 (transformers.js emits percentages). Line 344 stores it directly in modelLoad.progress, but voice-input-button.tsx:260 renders it as Math.round(modelLoad.progress * 100)% — expecting a 0–1 decimal. A 50% download therefore displays as "5000%".
- Why: The "ready" state at line 350 already uses the 0–1 convention (progress: 1), so the invariant is established — this path just breaks it. Any user loading a Whisper Web model sees a broken progress tooltip.
- Fix: divide by 100 at line 344: progress: p.progress / 100

Suggestion (recommended, doesn't block)

Stale example model ID in worker wire protocol comment — apps/web/workers/whisper-web.worker.ts:12
- Issue: The init message example still shows "Xenova/whisper-base", but the actual model IDs are "onnx-community/*" — and lines 14–18 explain Xenova was deliberately abandoned to avoid the MatMulNBits WASM crash.
- Fix: update to "onnx-community/whisper-base".
Capture-engine test coverage — apps/web/hooks/use-voice-input.test.ts:199 and apps/web/lib/voice/whisper-web-client.ts
- The whisperServer/whisperWeb capture paths and WhisperWebClient's Worker protocol have zero test coverage. Already acknowledged by author as a planned follow-up; tracking here for visibility.

Previous review findings — status

All earlier blockers confirmed addressed as of 5157743 / 283c001:

✅ 413 vs 400 for oversized bodies (errors.As(err, &maxBytesErr) in transcribe_handlers.go:62)
✅ Worker error handler reset to stale-worker-safe identity check (ensureWorker error listener)
✅ BCP-47 → ISO 639-1 strip in resolveWhisperLang (use-voice-input.ts:137)
✅ voice-api.ts spread order (...options?.init now before method/body)
✅ Stale closure in useVoiceModeSaver / useShortcutSaver (storeApi.getState() at call time)
✅ Rollback scoped to changed keys only (not whole snapshot)
✅ t.Fatalf in server goroutine replaced with t.Errorf + return
✅ Partial update bool assertions added to service_test.go
✅ abortDriver nulls Web Speech callbacks before abort()
✅ Stale Xenova JSDoc in whisper-web-models.ts

Summary

Severity	Count
Blocker	1
Suggestion	2

Verdict: Blocked — fix the progress scale bug before merge. One-liner fix at use-voice-input.ts:344.

cubic-dev-ai

2 issues found across 3 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

Round 3 of PR review feedback on #1159: - Normalise Whisper progress to 0–1 in ensureWhisperClient (transformers.js emits 0–100, the button's display + the `ready: 1` convention expect 0–1 — fixes the 5000% mid-download display). - Claim driverRef synchronously at the top of finishCapture so concurrent pointerup + pointerleave invocations early-return instead of racing and clobbering a freshly-started recording's ref. - Race-aware rollback: voice-mode and shortcut catch branches now skip reverting a key when latest[key] !== next[key]. A failed earlier request no longer overwrites a later successful save to the same key. - Update the Worker wire-protocol comment to show onnx-community/whisper-base instead of the stale Xenova/* example.

claude · 2026-05-29T17:51:36Z

Claude finished @jcfs's task in 3m 54s —— View job

Code Review — feat: add voice mode for chat input

Read skill instructions and gather context
Analyze latest commit (ffe88a6) changes
Review all changed files for new issues
Post findings

Findings

Previous blockers — all resolved in `ffe88a6`

All issues raised in the prior review passes are confirmed fixed:

✅ Progress scale: p.progress / 100 at use-voice-input.ts:353 — 5000% display gone
✅ Hold-mode double-stop race: driverRef.current = null now claimed synchronously before the first await in finishCapture (use-voice-input.ts:314)
✅ Race-aware rollback: voice-mode and shortcut catch branches skip reverting a key if latest[key] !== next[key] (voice-mode-settings.tsx:108, voice-mode-settings.tsx:439)
✅ Worker wire-protocol comment updated to onnx-community/whisper-base
✅ Whisper models JSDoc updated (whisper-web-models.ts:5)

Deferred (acknowledged by author, will be tracked in follow-up)

whisper-web-client.ts has no test file — apps/web/lib/voice/whisper-web-client.ts
- The Worker postMessage/onmessage protocol, pending-request queue, concurrent-call rejection, dispose-during-pending teardown, and blobToWhisperFloat32 audio resample pipeline are untested.
- Per project rules this is a blocker. Author has explicitly committed to opening a follow-up issue.
Capture-path tests missing — apps/web/hooks/use-voice-input.test.ts
- whisperServer and whisperWeb engine paths (beginCapture, finishCapture, transcribeViaServer, transcribeViaWhisperWeb) have zero coverage. The mock infrastructure (transcribeAudio via vi.mock, fake MediaRecorder, fake getUserMedia) is already partially established in the file.
- Per project rules this is a blocker. Author has explicitly committed to opening a follow-up issue.

Summary

Severity	Count
Deferred blocker	2

Verdict: All runtime bugs are fixed. The only open items are the two test-coverage gaps the author has explicitly deferred to a follow-up issue. The backend package (transcription service, config, user settings persistence, error hierarchy) is solid throughout. The frontend hook, settings page, worker, and rollback logic are now in good shape after the four rounds of fixes.

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go

Comment thread apps/backend/internal/voice/handlers/transcribe_handlers.go

Comment thread apps/web/hooks/use-voice-input.ts