-
Notifications
You must be signed in to change notification settings - Fork 1.8k
support using VAD with a streaming STT #4043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is off here. Whenever I take a pause longer than a few seconds, the connection will throw a APIStatusError(message="ElevenLabs STT connection closed unexpectedly") with a WSMessage(type=<WSMsgType.CLOSE: 8>, data=1000, extra=''), but it doesn't happen with the non-wrapped version.
My hunch is that it doesn't like empty audio data when committing.
Even if I changed the ping time to 1s, it still throws the same error. Not sure if the issue is on our end.
livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt_v2.py
Outdated
Show resolved
Hide resolved
livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py
Outdated
Show resolved
Hide resolved
it seems the elevenlabs STT has a timeout on audio input, maybe need an option to aways send audio to the STT. update: add a |
Thanks for adding that option this quickly. However, I don't think it works well with 11labs: I am getting this: with the zero silence. |
I think that's the issue of elevenlab, even passthrough the audio, it may generate either these tags or some random characters if there is a slight background noise. when we enabled the interruption from interim transcript, this actually breaks the agent playout and for now I don't think there is a good solution. I would expect they will improve their VAD model or fix this. |
|
Yeah, I agree. Should we just add a warning somewhere in the example or readme? I think it is totally fine to have the implementation available. |
chenghao-mou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@chenghao-mou update: stt=stt.StreamAdapter(
stt=elevenlabs.STT(
use_realtime=True,
server_vad=None, # disable server-side VAD
language_code="en",
),
vad=ctx.proc.userdata["vad"],
use_streaming=True,
),you can test it with this example https://github.com/livekit/agents/blob/longc/stream-stt-flush/examples/other/elevenlab_scribe_v2.py |
Yes, that was how I tested. It just hallucinates a lot no matter what options I tried. |
add a
STTCapabilities.flushto indicate if the stt supports flush (manual commit), and makestt.StreamAdapterwork with streaming STT.use cases:
related to #3881, should be merge to main when #4041 is done