feat(streaming): live transcription preview in HUD (Approach E)#59
Open
rdemeritt wants to merge 15 commits into
Open
feat(streaming): live transcription preview in HUD (Approach E)#59rdemeritt wants to merge 15 commits into
rdemeritt wants to merge 15 commits into
Conversation
Emit the pre-resample native-format AVAudioPCMBuffer from both installTap sites (primary + device-change-restart path). SpeechRecognitionService subscribes to this publisher to feed SFSpeechAudioBufferRecognitionRequest, which requires the hardware's native format rather than the 16 kHz resampled buffer.
New @mainactor service wrapping SFSpeechRecognizer. Streams partial transcription strings while the user holds the hotkey. Silently degrades when on-device recognition is unavailable so recording and Whisper injection are unaffected. Includes requestAuthorization static helper. stop() cancels the buffer subscription before endAudio() to prevent append-after-endAudio crashes.
…blisher - partialTranscriptPublisher (nonisolated PassthroughSubject) forwards SR partials to callers - onPreviewClear callback fires after dispatchOutput completes (untilInjected semantics) - SpeechRecognitionService starts after audioCapture.startCapture() and stops at top of stopAndTranscribe() - srCancellable forwards SR partials to partialTranscriptPublisher; cancelled on hotkey release - Add PreviewLingerMode enum and SettingsKey.previewLingerMode (P2 accessor deferred)
- FloatingHUDViewModel gains partialTranscript, showCaptionStrip, subscribeToPartials(), and clearPreview(); notifyRecordingStopped() no longer clears the transcript (pipeline fires onPreviewClear instead) - WaveformHUDView grows from 56 pt to 80 pt when the caption strip is visible; Text row fades in with 0.15 s delay after pill expansion - FloatingHUDWindow.recordingPanelHeight bumped to 80 pt - FloatingHUDWindowController gains subscribeToPartials() and clearPreview() forwarders
…ge description - NSSpeechRecognitionUsageDescription added to Info.plist - main.swift wires pipeline.partialTranscriptPublisher → hud.subscribeToPartials() and pipeline.onPreviewClear → hud.clearPreview() after HUD controller setup
…ngs accessor Manual Codable keeps JSON stable; Hashable enables SwiftUI Picker tags. SettingsManager.previewLingerMode accessor defaults to .linger(seconds: 2).
- FloatingHUDViewModel: handlePreviewClear(mode:) with Task-based linger timer; clearPreview() cancels any in-flight timer and resets state; notifyRecordingStarted() calls clearPreview() to reset on new recording - FloatingHUDWindowController: handlePreviewClear(mode:) forwarding method - main.swift: onPreviewClear reads settingsManager.previewLingerMode live at inject time; no polling observer needed - SettingsView: "After transcription completes" Picker inside Display section, visible only when HUD is enabled
- Request auth when .notDetermined; bail silently on first recording - Log [SR] warnings on all guard exits (not authorized, unavailable, on-device unsupported) - Handle recognition task error: log and cleanupTask() instead of propagating - Add os.log import to satisfy FileLogger dependency chain
New case signals SR preview was unavailable this session without surfacing to the user. Wired into errorDescription, severity (.info), and isUserActionable (false). main.swift switch updated for exhaustiveness.
…pography tokens DesignTokens: - Add HalideTokens.fontCaption (11pt rounded regular) FloatingHUDView: - Caption strip always present during recording (height fixed at 80pt) - "Listening..." placeholder shown when isRecording && !showCaptionStrip - Placeholder uses HalideTokens.textTertiary; partial text uses textSecondary - Caption text uses HalideTokens.fontCaption (replaces hardcoded .system call) - captionHighlighted: Bool drives 400ms textPrimary flash on preview clear - handlePreviewClear: .untilInjected and .linger both trigger flash before countdown - Caption .animation respects @Environment(\.accessibilityReduceMotion) - Remove redundant showCaptionStrip size animation binding
Replaces tail-truncation with a sliding window that shows the last 7 words of the live partial transcript. A left-edge LinearGradient fade mask signals that earlier words have scrolled off. Fade suppressed for fewer than 8 words and when accessibilityReduceMotion is on.
`brew` and `swiftlint` aren't in the self-hosted runner's default PATH. Use /opt/homebrew/bin/ prefix, consistent with the SQLCipher install step.
Frame-then-padding order was clipping the inset to zero — the 200pt text frame filled the pill and `.padding(.horizontal, 10)` overflowed and was clipped by the parent. Flip to padding-first so the text content area is 172pt (200 - 28pt) with equal 14pt margins inside the pill.
- sorted_imports: fix import order in SpeechRecognitionService.swift - unneeded_break_in_switch: remove redundant break in errorHold case - file_length: move VoiceCommand.displayLabel to VoiceCommand+UI.swift, bringing FloatingHUDWindow.swift from 404 to 389 lines
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds live streaming transcription preview using on-device
SFSpeechRecognizerwhile the user speaks. Final inject via the existing Whisper batch pipeline is unchanged. QA-approved across all three phases before merge.Supersedes: PR #58 (closed when base branch was deleted after PR #57 merged).
What ships
P1 — Live preview MVP
AudioCaptureService:audioBufferPublisheremits pre-resample native-format buffers from the existing tap (both primary + device-change-restart sites)SpeechRecognitionService(new):SFSpeechRecognizerwithrequiresOnDeviceRecognition = true; publishes partials; silently degrades if SR unavailable/unauthorizedTranscriptionPipeline: starts SR onstartRecording(), cancels at top ofstopAndTranscribe(), firesonPreviewClearafter inject success and Whisper errorFloatingHUDView: caption strip below waveform (200×80 recording pill); "Listening…" placeholderInfo.plist:NSSpeechRecognitionUsageDescriptionP2 — Configurable persistence
PreviewLingerModeenum with manualCodable+ synthesizedHashableSettingsManager.previewLingerMode, default.linger(seconds: 2)P3 — Resilience + polish
.notDeterminedtriggers async system prompt, bails for session; all exits loggedPipelineError.previewUnavailable(String)—.infoseverityHalideTokens.fontCaption, "Listening…" placeholder, inject flash animation,accessibilityReduceMotionQA gates
onPreviewClearon Whisper error path)Test plan