[RFC] 012 - 支持 TTS & STT 语音对话 #367
Replies: 7 comments 6 replies
-
Mark,非常需要一个,看看能不能搞个组件出来 |
Beta Was this translation helpful? Give feedback.
-
Lobe中的实现链路大致应该是怎么样的? |
Beta Was this translation helpful? Give feedback.
-
遗留问题:ChatGPT 流传输和 TTS 流传输如何结合 能想到的是,根据 ChatGPT 返回结果通过换行检测或者段落标点进行分割,分段供给 TTS 进行语音合成,合成后进行队列播放,同时通过 Record 再整合为一段完整的长音频 |
Beta Was this translation helpful? Give feedback.
-
useSpeechRecognition
import { useState } from 'react';
export const useSpeechRecognition = (locale: string) => {
const [text, setText] = useState<string>('');
const [processing, setProcessing] = useState<boolean>(false);
const recognition = new (window as any).webkitSpeechRecognition();
recognition.lang = locale;
recognition.interimResults = true;
recognition.continuous = true;
recognition.onstart = () => {
setProcessing(true);
setText('');
};
recognition.onend = () => setProcessing(false);
recognition.onresult = ({ results }: any) => {
if (!results) return;
const result = results[0];
if (result?.[0]?.transcript) setText(result[0].transcript);
if (result.isFinal) recognition.abort();
};
return {
processing,
start: recognition.start,
stop: recognition.stop,
text,
};
}; useSpeechSynthesimport { useMemo, useState } from 'react';
import { SsmlOptions } from '@/useTTS/utils/genSSML';
import { VoiceList } from '@/useTTS/utils/getVoiceList';
export const useSpeechSynthes = (options: SsmlOptions) => {
const [text, setText] = useState<string>('');
const [processing, setProcessing] = useState<boolean>(false);
const speechSynthesisUtterance = useMemo(() => {
const utterance = new SpeechSynthesisUtterance(text);
utterance.voice = options.name as any;
if (options.pitch) utterance.pitch = options.pitch;
if (options.rate) utterance.rate = options.rate;
return utterance;
}, [text]);
const voiceList: VoiceList = useMemo(() => {
const data = speechSynthesis.getVoices();
const list: VoiceList = {};
for (const voice of data) {
if (list[voice.lang]) list[voice.lang] = [];
list[voice.lang].push({ localName: voice.name, name: voice.voiceURI });
}
return list;
}, []);
speechSynthesisUtterance.onstart = () => setProcessing(true);
speechSynthesisUtterance.onend = () => setProcessing(false);
return {
processing,
setText,
start: () => speechSynthesis.speak(speechSynthesisUtterance),
stop: speechSynthesis.cancel,
voiceList,
};
}; |
Beta Was this translation helpful? Give feedback.
-
chatgpt 4v 公开了tts语音接口 研究下 |
Beta Was this translation helpful? Give feedback.
-
可以测试 TTS 和 STT 功能了, 目前支持的 TTS 和 STT 服务如图,Azure TTS 待后续支持 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Important
构思中...refs: #267
进展:lobe-tts.vercel.app
PR: #443
Caniss.video.mp4
背景
构思增加一个语音模态对话交互功能
参考 OpenAI 官方 pro 功能:https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
TTS 服务
TTS
正式接口(绑卡),稳定性佳,付费/ 免费额度 T2V 每月50万字符 + V2T 每月5小时音频长度TTS
,免费但合成音色机器感太强A
Microsoft Speech APIB
Azure Speech APIC
Edge TTS WSSD
speechSynthesis见 4 楼
SSML
SSML 可控制音色,情感,语调等,通过 prompt 的方式可以让 ChatGPT 按 SSML 格式要求回复,实现带有感情的对话输出
Beta Was this translation helpful? Give feedback.
All reactions