[RFC] 012 - 支持 TTS & STT 语音对话 #367

canisminor1990 · 2023-10-27T08:59:45Z

canisminor1990
Oct 27, 2023
Maintainer

Important

构思中...refs: #267
进展：lobe-tts.vercel.app
PR: #443

🎤 Lobe TTS - A high-quality & reliable TTS library @loebhub/tts

Caniss.video.mp4

背景

构思增加一个语音模态对话交互功能

参考 OpenAI 官方 pro 功能：https://openai.com/blog/chatgpt-can-now-see-hear-and-speak

TTS 服务

Note
目前主流 TTS 能力还是利用微软 Azure

Microsoft Speech API: 是微软演示页面的接口，稳定性不佳，免费/Hack
Azure Speech API: 微软 TTS 正式接口(绑卡)，稳定性佳，付费/ 免费额度 T2V 每月50万字符 + V2T 每月5小时音频长度
Edge TTS WSS: Edge 浏览器大声朗读接口
speechSynthesis: 浏览器自带 TTS，免费但合成音色机器感太强

`A` Microsoft Speech API

Note
参考 TTS-VUE：https://github.com/LokerL/tts-vue/blob/ce71f0203f21cd7c19ac290872fbb3f99f24551b/electron/utils/api.ts

import { v4 as uuidv4 } from 'uuid';

import { type SsmlOptions, genSSML } from '@/useTTS/utils/genSSML';

export const postMicrosoftSpeech = async (
  text: string,
  options: SsmlOptions,
): Promise<ArrayBuffer> => {
  const data = JSON.stringify({
    offsetInPlainText: 0,
    properties: {
      SpeakTriggerSource: 'AccTuningPagePlayButton',
    },
    ssml: genSSML(text, options),
    ttsAudioFormat: 'audio-24khz-160kbitrate-mono-mp3',
  });

  const config = {
    body: data,
    headers: {
      'accept': '*/*',
      'accept-language': 'zh-CN,zh;q=0.9',
      'authority': 'southeastasia.api.speech.microsoft.com',
      'content-type': 'application/json',
      'customvoiceconnectionid': uuidv4(),
      'origin': 'https://speech.microsoft.com',
      'sec-ch-ua': '"Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111"',
      'sec-ch-ua-mobile': '?0',
      'sec-ch-ua-platform': '"Windows"',
      'sec-fetch-dest': 'empty',
      'sec-fetch-mode': 'cors',
      'sec-fetch-site': 'same-site',
      'user-agent':
        'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36',
    },
    method: 'post',
    responseType: 'arraybuffer',

    url: 'https://southeastasia.api.speech.microsoft.com/accfreetrial/texttospeech/acc/v3.0-beta1/vcg/speak',
  };

  try {
    const response: Response = await fetch(config.url, config);
    return await response.arrayBuffer();
  } catch (error) {
    console.error(error);
    throw error;
  }
};

`B` Azure Speech API

Note
文档地址：https://learn.microsoft.com/zh-cn/azure/ai-services/speech-service/index-text-to-speech

https://github.com/chessyu/TTS-Nextjs/blob/main/app/_server/speechModel/index.ts

import {
  AudioConfig,
  PropertyId,
  ResultReason,
  SpeechConfig,
  SpeechSynthesisOutputFormat,
  SpeechSynthesisResult,
  SpeechSynthesizer,
} from 'microsoft-cognitiveservices-speech-sdk';

import { type SsmlOptions, genSSML } from '@/useTTS/utils/genSSML';

const SPEECH_KEY = process.env.NEXT_PUBLIC_SPEECH_KEY;
const SPEECH_REGION = process.env.NEXT_PUBLIC_SPEECH_REGION;

// 纯文本生成语音
export const postAzureSpeech = async (text: string, options: SsmlOptions): Promise<Buffer> => {
  const speechConfig = SpeechConfig.fromSubscription(SPEECH_KEY!, SPEECH_REGION!);
  speechConfig.setProperty(PropertyId.SpeechServiceResponse_RequestSentenceBoundary, 'true');
  speechConfig.speechSynthesisOutputFormat =
    SpeechSynthesisOutputFormat.Audio16Khz32KBitRateMonoMp3;

  let audio_config = AudioConfig.fromDefaultSpeakerOutput();
  let synthesizer: SpeechSynthesizer | null = new SpeechSynthesizer(speechConfig, audio_config);

  const completeCb = (
    result: SpeechSynthesisResult,
    resolve: (value: Buffer) => void,
    reject: (err?: any) => void,
  ) => {
    if (result.reason === ResultReason.SynthesizingAudioCompleted) {
      resolve(Buffer.from(result.audioData));
    } else {
      reject(result);
    }
    synthesizer?.close();
    synthesizer = null;
  };

  const errCb = (err: string, reject: (err?: any) => void) => {
    reject(err);
    synthesizer?.close();
    synthesizer = null;
  };

  return new Promise<Buffer>((resolve, reject) => {
    synthesizer?.speakSsmlAsync(
      genSSML(text, options),
      (result) => completeCb(result, resolve, reject),
      (err) => errCb(err, reject),
    );
  });
};

`C` Edge TTS WSS

Note
https://github.com/skygongque/tts/blob/main/python_cli_demo/tts.py
https://github.com/Migushthe2nd/MsEdgeTTS

`D` speechSynthesis

见 4 楼

SSML

Note
文档地址：https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup
格式化构建工具：https://www.npmjs.com/package/ssml-document

SSML 可控制音色，情感，语调等，通过 prompt 的方式可以让 ChatGPT 按 SSML 格式要求回复，实现带有感情的对话输出

<voice name="zh-CN-YunzeNeural" mstts:express-as:style="fearful" styledegree="1" role="" rate="0%" pitch="0%" id="role1">
    在一个阴雨连绵的夜晚，有一个年轻人走在回家的路上。他的家在一个荒郊野岭中，附近没有其他的人家。
</voice>

import { Document, ServiceProvider } from 'ssml-document';

type StyleName =
  | 'affectionate'
  | 'angry'
  | 'calm'
  | 'cheerful'
  | 'disgruntled'
  | 'embarrassed'
  | 'fearful'
  | 'general'
  | 'gentle'
  | 'sad'
  | 'serious';

export interface SsmlOptions {
  name: string;
  pitch?: number;
  rate?: number;
  style?: StyleName;
}

export const genSSML = (text: string, options: SsmlOptions) => {
  let ssml = new Document().voice(options.name);
  if (options.style) ssml = ssml.expressAs({ style: options.style });
  if (options.pitch || options.rate)
    ssml = ssml.prosody({ pitch: options.pitch, rate: options.rate });
  return ssml.say(text).render({ pretty: true, provider: ServiceProvider.Microsoft });
};

ONLY-yours · 2023-10-27T10:14:37Z

ONLY-yours
Oct 27, 2023

Mark，非常需要一个，看看能不能搞个组件出来

0 replies

arvinxx · 2023-10-27T10:36:50Z

arvinxx
Oct 27, 2023
Maintainer

Lobe中的实现链路大致应该是怎么样的？

3 replies

canisminor1990 Oct 27, 2023
Maintainer Author

看是不是结合在chat里还是单独的侧边栏模块，结合chat的话就是在现有agent config基础上再加上当前助手tts的音色设置

canisminor1990 Oct 27, 2023
Maintainer Author

如果单独就要新增一个SpeechList

arvinxx Oct 27, 2023
Maintainer

语音会话有必要有历史记录吗？
从打造 agent 的目标来说，语音配置合在config里比较合理，也便于扩展和复用。比如agentMarket 中可以有语音助手。
展示形态上感觉应该是替换 Conversation 变成语音的界面？
我理解 Ai 交互的文本部分应该仍然使用现有的逻辑，只是多了一个前置的语音转文本，和后置的文本转语音？如果保持这个思路的话，语音部分的状态信息应该和translate 类似，存在 extra 字段中

canisminor1990 · 2023-10-31T05:53:26Z

canisminor1990
Oct 31, 2023
Maintainer Author

遗留问题：ChatGPT 流传输和 TTS 流传输如何结合

能想到的是，根据 ChatGPT 返回结果通过换行检测或者段落标点进行分割，分段供给 TTS 进行语音合成，合成后进行队列播放，同时通过 Record 再整合为一段完整的长音频

2 replies

canisminor1990 Oct 31, 2023
Maintainer Author

https://www.reddit.com/r/ChatGPT/comments/14h1fx9/does_anyone_know_a_good_tts_that_accepts_the/

Does anyone know a good TTS that accepts the ChatGpt stream of words?

I am using ChatGPT to create a speech to text -> chat gpt -> text to speech bot for my twitch streams but I am having issues finding a good text to speech. I am currently using windows in-built text to speech but it is very limited and frankly bad compared to others, but it was the only way I could think of handling the stream of words you get from chatgpt api (instead of a whole block of words at once). All the cloud based text to speech I have found seem to only accept the full complete "paragraph" of words, none have the ability to feed in words one by one.

Does anyone know a good software or cloud based text to speech that can handle a stream of words well? Paid or unpaid is fine.

canisminor1990 Oct 31, 2023
Maintainer Author

https://elevenlabs.io/

似乎支持 text stream tts，但是资费挺贵

canisminor1990 · 2023-10-31T07:08:54Z

canisminor1990
Oct 31, 2023
Maintainer Author

useSpeechRecognition

Note
https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition

import { useState } from 'react';

export const useSpeechRecognition = (locale: string) => {
  const [text, setText] = useState<string>('');
  const [processing, setProcessing] = useState<boolean>(false);
  const recognition = new (window as any).webkitSpeechRecognition();
  recognition.lang = locale;
  recognition.interimResults = true;
  recognition.continuous = true;

  recognition.onstart = () => {
    setProcessing(true);
    setText('');
  };
  recognition.onend = () => setProcessing(false);

  recognition.onresult = ({ results }: any) => {
    if (!results) return;
    const result = results[0];
    if (result?.[0]?.transcript) setText(result[0].transcript);
    if (result.isFinal) recognition.abort();
  };

  return {
    processing,
    start: recognition.start,
    stop: recognition.stop,
    text,
  };
};

useSpeechSynthes

import { useMemo, useState } from 'react';

import { SsmlOptions } from '@/useTTS/utils/genSSML';
import { VoiceList } from '@/useTTS/utils/getVoiceList';

export const useSpeechSynthes = (options: SsmlOptions) => {
  const [text, setText] = useState<string>('');
  const [processing, setProcessing] = useState<boolean>(false);

  const speechSynthesisUtterance = useMemo(() => {
    const utterance = new SpeechSynthesisUtterance(text);
    utterance.voice = options.name as any;
    if (options.pitch) utterance.pitch = options.pitch;
    if (options.rate) utterance.rate = options.rate;
    return utterance;
  }, [text]);

  const voiceList: VoiceList = useMemo(() => {
    const data = speechSynthesis.getVoices();
    const list: VoiceList = {};
    for (const voice of data) {
      if (list[voice.lang]) list[voice.lang] = [];
      list[voice.lang].push({ localName: voice.name, name: voice.voiceURI });
    }
    return list;
  }, []);

  speechSynthesisUtterance.onstart = () => setProcessing(true);
  speechSynthesisUtterance.onend = () => setProcessing(false);

  return {
    processing,
    setText,
    start: () => speechSynthesis.speak(speechSynthesisUtterance),
    stop: speechSynthesis.cancel,
    voiceList,
  };
};

0 replies

canisminor1990 · 2023-11-07T02:21:03Z

canisminor1990
Nov 7, 2023
Maintainer Author

chatgpt 4v 公开了tts语音接口研究下

0 replies

canisminor1990 · 2023-11-12T14:34:38Z

canisminor1990
Nov 12, 2023
Maintainer Author

可以测试 TTS 和 STT 功能了, 目前支持的 TTS 和 STT 服务如图，Azure TTS 待后续支持

0 replies

arvinxx · 2023-11-17T03:16:21Z

arvinxx
Nov 17, 2023
Maintainer

测试用例

TTS

功能主链路：

点击语音朗读，播放音频 (默认 OpenAI)
切换到 Microsoft 后，点击播放音频，可正常播放
切换到 Edge 后，点击语音朗读，播放音频
PS：目前本地测试有报错

长文本相关：

（使用OpenAI）当播放一段长文本时，可以在合成语音中停止生成
PS：现在生成过程中没有停止按钮：

使用 Microsoft Speech 时，可正常播放长文本

PS：现在会出现 429 错误。应该是短时间内重复请求的问题。而且一旦有一个错误，会导致整个播放中断

且中断后无法重新播放

分支链路：

刷新页面后，再点击播放按钮，可正常播放

异常链路：

请求层面报错，例如网络404，服务器500，界面侧有提示；
返回结果有问题，界面有提示；

1 reply

vm2018 Dec 31, 2024

Microsoft Speech 在哪里可以配置？
版本信息v1.34.0 ~ v1.33.1 十一月 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] 012 - 支持 TTS & STT 语音对话 #367

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[RFC] 012 - 支持 TTS & STT 语音对话 #367

canisminor1990 Oct 27, 2023 Maintainer

背景

TTS 服务

A Microsoft Speech API

B Azure Speech API

C Edge TTS WSS

D speechSynthesis

SSML

Replies: 7 comments · 6 replies

ONLY-yours Oct 27, 2023

arvinxx Oct 27, 2023 Maintainer

canisminor1990 Oct 27, 2023 Maintainer Author

canisminor1990 Oct 27, 2023 Maintainer Author

arvinxx Oct 27, 2023 Maintainer

canisminor1990 Oct 31, 2023 Maintainer Author

canisminor1990 Oct 31, 2023 Maintainer Author

canisminor1990 Oct 31, 2023 Maintainer Author

canisminor1990 Oct 31, 2023 Maintainer Author

useSpeechRecognition

useSpeechSynthes

canisminor1990 Nov 7, 2023 Maintainer Author

canisminor1990 Nov 12, 2023 Maintainer Author

arvinxx Nov 17, 2023 Maintainer

测试用例

TTS

vm2018 Dec 31, 2024

canisminor1990
Oct 27, 2023
Maintainer

`A` Microsoft Speech API

`B` Azure Speech API

`C` Edge TTS WSS

`D` speechSynthesis

Replies: 7 comments 6 replies

ONLY-yours
Oct 27, 2023

arvinxx
Oct 27, 2023
Maintainer

canisminor1990 Oct 27, 2023
Maintainer Author

canisminor1990 Oct 27, 2023
Maintainer Author

arvinxx Oct 27, 2023
Maintainer

canisminor1990
Oct 31, 2023
Maintainer Author

canisminor1990 Oct 31, 2023
Maintainer Author

canisminor1990 Oct 31, 2023
Maintainer Author

canisminor1990
Oct 31, 2023
Maintainer Author

canisminor1990
Nov 7, 2023
Maintainer Author

canisminor1990
Nov 12, 2023
Maintainer Author

arvinxx
Nov 17, 2023
Maintainer