Skip to content

docs(#1186): Add comprehensive multi-speaker TTS guide#1190

Open
moguangyu5-design wants to merge 1 commit intofishaudio:mainfrom
moguangyu5-design:docs/1186-multi-speaker-usage
Open

docs(#1186): Add comprehensive multi-speaker TTS guide#1190
moguangyu5-design wants to merge 1 commit intofishaudio:mainfrom
moguangyu5-design:docs/1186-multi-speaker-usage

Conversation

@moguangyu5-design
Copy link

Summary

Adds comprehensive documentation for native multi-speaker TTS generation to address Issue #1186.

Changes Made

1. Created Multi-Speaker Guide (docs/multi_speaker_guide.md)

Complete documentation covering:

Four Usage Methods:

  • CLI with direct reference audio
  • CLI with reference ID (stored references)
  • Python API direct integration
  • REST API HTTP calls

Key Documentation:

  • Reference audio requirements (3-10 seconds, clean audio)
  • Memory caching for performance
  • Multi-speaker architecture explanation
  • Troubleshooting common issues
  • Advanced usage patterns

API Reference:

  • Complete ServeTTSRequest schema
  • ServeReferenceAudio schema
  • All parameters documented

2. Key Features Documented

Voice Cloning:

  • Single vs multiple reference samples
  • Quality requirements and best practices
  • Common issues and solutions

Prosody Control:

Advanced Usage:

  • Multi-language code-switching
  • Style transfer techniques
  • Fine-grained parameter control

How Multi-Speaker Generation Works

  1. Voice Encoding: Extract voice embedding from reference audio
  2. Text Processing: Prepare text for synthesis
  3. In-Context Learning: Model learns voice characteristics from references
  4. Audio Generation: Synthesize speech in the target voice
  5. Post-Processing: Format and return audio

Testing

Documentation includes working examples:

  • All CLI commands tested
  • API calls verified for syntax
  • Python code is functional

Usage Example

python -m tools.api_client \
  --text "Hello, this is synthesized speech." \
  --reference_audio speaker.wav \
  --reference_text "This is the reference transcript." \
  --output output.wav

Related Issue

Resolves #1186

🤖 Generated with Claude Code

Issue fishaudio#1186: Native Multi-Speaker Generation usage documentation

Added complete documentation covering:
- Four different usage methods (CLI, direct audio, reference ID, Python API, REST API)
- Reference audio requirements and best practices
- Troubleshooting common issues
- Advanced usage: code-switching, style transfer
- Complete API reference

Users can now understand how to perform multi-speaker TTS using reference audio
for voice cloning, including all parameters and expected behavior.

🤖 Generated with Claude Code

Resolves fishaudio#1186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The usage of Native Multi-Speaker Generation

1 participant