support using VAD with a streaming STT #4043

longcw · 2025-11-21T06:03:15Z

add a STTCapabilities.flush to indicate if the stt supports flush (manual commit), and make stt.StreamAdapter work with streaming STT.

use cases:

only send audio frames to STT when VAD detects user speech.
support manual commit of elevenLabs scribe v2, fix feat(elevenlabs): add STTv2 with streaming support for Scribe v2 #3909 (review)

related to #3881, should be merge to main when #4041 is done

chenghao-mou

Something is off here. Whenever I take a pause longer than a few seconds, the connection will throw a APIStatusError(message="ElevenLabs STT connection closed unexpectedly") with a WSMessage(type=<WSMsgType.CLOSE: 8>, data=1000, extra=''), but it doesn't happen with the non-wrapped version.

~~My hunch is that it doesn't like empty audio data when committing.~~

Even if I changed the ping time to 1s, it still throws the same error. Not sure if the issue is on our end.

livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt_v2.py

livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py

longcw · 2025-11-21T10:27:07Z

Something is off here. Whenever I take a pause longer than a few seconds, the connection will throw a APIStatusError(message="ElevenLabs STT connection closed unexpectedly") with a WSMessage(type=<WSMsgType.CLOSE: 8>, data=1000, extra=''), but it doesn't happen with the non-wrapped version.

it seems the elevenlabs STT has a timeout on audio input, maybe need an option to aways send audio to the STT.

update: add a silence_mode: Literal["drop", "zeros", "passthrough"] option to send original or zero filled frames when VAD is negative. it's true that not every STT supports discontinued audio frames.

chenghao-mou · 2025-11-21T11:16:44Z

Something is off here. Whenever I take a pause longer than a few seconds, the connection will throw a APIStatusError(message="ElevenLabs STT connection closed unexpectedly") with a WSMessage(type=<WSMsgType.CLOSE: 8>, data=1000, extra=''), but it doesn't happen with the non-wrapped version.

it seems the elevenlabs STT has a timeout on audio input, maybe need an option to aways send audio to the STT.

update: add a silence_mode: Literal["drop", "zeros", "passthrough"] option to send original or zero filled frames when VAD is negative. it's true that not every STT supports discontinued audio frames.

Thanks for adding that option this quickly. However, I don't think it works well with 11labs: I am getting this:

    11:15:00 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:03 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:04 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:07 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:08 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}

with the zero silence.

longcw · 2025-11-21T11:21:35Z

Thanks for adding that option this quickly. However, I don't think it works well with 11labs: I am getting this:

    11:15:00 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:03 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:04 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:07 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}  
    11:15:08 DEBUG  livekit.plugins… Received message type partial_transcript: {'message_type': 'partial_transcript', 'text': '*static*'}

with the zero silence.

I think that's the issue of elevenlab, even passthrough the audio, it may generate either these tags or some random characters if there is a slight background noise.

when we enabled the interruption from interim transcript, this actually breaks the agent playout and for now I don't think there is a good solution. I would expect they will improve their VAD model or fix this.

chenghao-mou · 2025-11-21T11:25:58Z

Yeah, I agree. Should we just add a warning somewhere in the example or readme? I think it is totally fine to have the implementation available.

chenghao-mou

LGTM

longcw · 2025-11-21T11:27:03Z

@chenghao-mou update: ~~it seems it elevenlabs STT works when server VAD is disabled~~ it's better when server VAD is disabled, but still sometimes got some random output from STT, because it will generate a text no matter it's silent or even just noise.

stt=stt.StreamAdapter(
            stt=elevenlabs.STT(
                use_realtime=True,
                server_vad=None,  # disable server-side VAD
                language_code="en",
            ),
            vad=ctx.proc.userdata["vad"],
            use_streaming=True,
        ),

you can test it with this example https://github.com/livekit/agents/blob/longc/stream-stt-flush/examples/other/elevenlab_scribe_v2.py

chenghao-mou · 2025-11-21T11:31:40Z

@chenghao-mou update: ~~it seems it elevenlabs STT works when server VAD is disabled~~ it's better when server VAD is disabled, but still sometimes got some random output from STT, because it will generate a text no matter it's silent or even just noise.
stt=stt.StreamAdapter(
            stt=elevenlabs.STT(
                use_realtime=True,
                server_vad=None,  # disable server-side VAD
                language_code="en",
            ),
            vad=ctx.proc.userdata["vad"],
            use_streaming=True,
        ),
you can test it with this example longc/stream-stt-flush/examples/other/elevenlab_scribe_v2.py

Yes, that was how I tested. It just hallucinates a lot no matter what options I tried.

longcw added 3 commits November 21, 2025 11:24

add use_realtime to 11labs STT and support scribe v2 realtime model

f4f1f3e

raise error

a18611f

support using VAD with a streaming STT

4576f38

longcw requested a review from a team November 21, 2025 06:06

longcw mentioned this pull request Nov 21, 2025

add use_realtime to elevenlabs stt and support scribe v2 realtime model #4041

Merged

chenghao-mou reviewed Nov 21, 2025

View reviewed changes

livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt_v2.py Outdated Show resolved Hide resolved

livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py Outdated Show resolved Hide resolved

longcw closed this Nov 21, 2025

longcw reopened this Nov 21, 2025

add silence_mode

72aae83

chenghao-mou approved these changes Nov 21, 2025

View reviewed changes

Base automatically changed from longc/11labs-stt-realtime to main November 22, 2025 00:52

Merge remote-tracking branch 'origin/main' into longc/stream-stt-flush

c6c372f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support using VAD with a streaming STT #4043

support using VAD with a streaming STT #4043

Uh oh!

longcw commented Nov 21, 2025 •

edited

Loading

Uh oh!

chenghao-mou left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

longcw commented Nov 21, 2025 •

edited

Loading

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

longcw commented Nov 21, 2025

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

chenghao-mou left a comment

Uh oh!

longcw commented Nov 21, 2025 •

edited

Loading

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

support using VAD with a streaming STT #4043

Are you sure you want to change the base?

support using VAD with a streaming STT #4043

Uh oh!

Conversation

longcw commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenghao-mou left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

longcw commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

longcw commented Nov 21, 2025

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

chenghao-mou left a comment

Choose a reason for hiding this comment

Uh oh!

longcw commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenghao-mou commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

longcw commented Nov 21, 2025 •

edited

Loading

chenghao-mou left a comment •

edited

Loading

longcw commented Nov 21, 2025 •

edited

Loading

longcw commented Nov 21, 2025 •

edited

Loading