[ Discussion ] Realtime voice agents

I was planning to use csm-1b as the final layer of a real-time voice agent (VAD -> ASR -> LLM -> TTS) that I was working on. However, due to its super slow inference time, it seems like an unrealistic goal at this point. 

This is the repo I'm working on: [https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI](https://github.com/asiff00/On-Device-Speech-to-Speech-Conversational-AI)

I currently use kokoro as the TTS engine, which what makes it possible to run in real-time. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ Discussion ] Realtime voice agents #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ Discussion ] Realtime voice agents #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions