Skip to content

Latest commit

 

History

History
94 lines (53 loc) · 4.12 KB

File metadata and controls

94 lines (53 loc) · 4.12 KB
description
note: this feature requires you to be a server booster on discord.gg/shapes

Shape Voices

{% embed url="https://files.shapes.inc/api/files/2024-02-12-23-07-02.mp3" %} sent by a Shape 👀 {% endembed %}

{% embed url="https://youtu.be/O3ePvdjTX14" %} walk-through {% endembed %}

Join discord.gg/shapes and boost the server

Head onto shapes.inc >> Voice Engine page

now we're ready to get your shape's voice set up 😎

1. Upload your shape's voice using an audio file

2. Scroll down and set your shape's voice frequency

  • 0.10 means your shape will send voice messages 10% of the time
  • We recommend keeping the voice stability, similarity, and style settings at our recommended configurations unless you are aware it can affect the quality of your shape's voice.

3. Enable voice and save changes

4. Voice models

Here, You can select a voice model you want your shape to use.

Multilingual v2 (slow) is optimized for quality and accuracy, ideal for content creation. This model offers the best quality and stability but have higher latency.

Mutlilingual v2 (Turbo) and Eleven Flash v2 (english) are designed for low-latency applications like real-time conversational AI. They deliver great performance with faster processing speeds, though with a slight trade-off in accuracy and stability.

Multilingual v2 is default

5. Bonus - turn on transcript

Adjusting Parameters

Voice Frequency

We recommend 0.1. This sets how often shape replies with voice messages. 0 means your shape will never send a voice message. 1 means your shape will always reply with voice messages.

Voice Stability

The default is 0.53. Low value makes it more variable which can make speech more expressive with output may vary between re-generations. It can also lead to instabilities. Higher values increase stability which will make the voice more consistent between re-generations, but it can also make your shape sound a bit monotone.

Voice Similarity

The default is 0.74. Low values are recommended if background artifacts are present in generated speech. High values boost overall voice clarity and target speaker similarity. Very high values can cause artifacts, so adjusting this setting to find the optimal value is encouraged.

Voice Style

The default is 0.16. High values are recommended if the style of the speech should be exaggerated compared to the uploaded audio. Higher values can lead to more instability in the generated speech. Setting this to 0.0 will greatly increase voice generation speed.

The usual go to for voice samples are:

  • clear audible voices
  • a wide range of vowels and syllables
  • as long as possible without going over 4MB in size
  • if you want an accent: make sure it’s noticeable and clear
  • and gender matters, a masculine voice will be vastly different to a feminine or androgynous voice
  • be careful with interjections! They can modify how some things are pronounced
  • when using multiple speakers in a sample, make sure the one you want to hear more often is louder and more consistent! Otherwise the AI will tend to take the median (average) sounds of the speakers

{% hint style="warning" %} If you unboost the server, your shape's voice and its settings will be deleted. {% endhint %}

Talk to your Shape like normal :)