GitHub - Vyvo-Labs/VyvoTTS

VyvoTTS: LLM-Based Text-to-Speech Training Framework 🚀

This library was developed by the VyvoTTS team. A Text-to-Speech (TTS) training and inference framework built on top of the LLM model.

✨ Features

Pre-training: Train LLM models from scratch with custom datasets
Fine-tuning: Adapt pre-trained models for specific TTS tasks
LoRA Adaptation: Memory-efficient fine-tuning using Low-Rank Adaptation
Voice Cloning: Clone voices using advanced neural techniques
Multi-GPU Support: Distributed training with accelerate

📦 Installation

uv venv --python 3.10
uv pip install -r requirements.txt

🚀 Quick Start

Dataset Preparation

VyvoTTS provides a unified tokenizer that works with both Qwen3 and LFM2 models. The tokenizer reads configuration from YAML files for flexibility.

Tokenizer Usage

from vyvotts.audio_tokenizer import process_dataset

# For Qwen3
process_dataset(
    original_dataset="MrDragonFox/Elise",
    output_dataset="username/dataset-name",
    model_type="qwen3",
    text_field="text"
)

# For LFM2
process_dataset(
    original_dataset="MrDragonFox/Elise",
    output_dataset="username/dataset-name",
    model_type="lfm2",
    text_field="text"
)

Training

Fine-tuning

⚠️ GPU Requirements:** 30GB VRAM minimum required for fine-tuning

Configure your fine-tuning parameters in vyvotts/configs/lfm2_ft.yaml and run:

accelerate launch --config_file vyvotts/configs/accelerate_finetune.yaml vyvotts/train.py

💻 For lower-end GPUs (6GB+):** Use the Unsloth FP8/FP4 training notebook:

uv pip install jupyter notebook
uv jupyter notebook notebook/vyvotts-lfm2-train.ipynb

Pre-training

Configure your pre-training parameters in vyvotts/configs/lfm2_config.yaml and run:

accelerate launch --config_file vyvotts/configs/accelerate_pretrain.yaml vyvotts/train.py

Inference

VyvoTTS provides multiple inference backends optimized for different use cases:

1. Transformers Inference (Standard)

Standard inference using HuggingFace Transformers with full precision.

from vyvotts.inference.transformers_inference import VyvoTTSTransformersInference

# Initialize engine
engine = VyvoTTSTransformersInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)

# Generate speech
audio, timing_info = engine.generate(
    text="Hello, this is a test of the text to speech system.",
    voice=None,  # Optional: specify voice name
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.6,
    top_p=0.95
)

2. Unsloth Inference (Memory Efficient)

Optimized inference with 4-bit/8-bit quantization support.

from vyvotts.inference.unsloth_inference import VyvoTTSUnslothInference

# Initialize engine with 4-bit quantization
engine = VyvoTTSUnslothInference(
    model_name="Vyvo/VyvoTTS-v2-Neuvillette",
    load_in_4bit=False,  # Use 4-bit quantization for lower memory
    load_in_8bit=False,
)

# Generate and save audio
audio = engine.generate(
    text="Hey there, my name is Elise.",
    voice=None,
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.7
)

3. HQQ Quantized Inference (4-bit)

High-quality 4-bit quantization with gemlite backend for faster inference.

from vyvotts.inference.transformers_hqq_inference import VyvoTTSHQQInference

# Initialize engine with HQQ quantization
engine = VyvoTTSHQQInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette",
    nbits=8,  # 8/4/2/1-bit quantization
    group_size=64
)

# Generate speech
audio, timing_info = engine.generate(
    text="Hello world, this is HQQ inference.",
    voice=None,
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.6
)

4. vLLM Inference (Fastest)

Production-ready inference with vLLM for maximum throughput.

from vyvotts.inference.vllm_inference import VyvoTTSInference

# Initialize engine
engine = VyvoTTSInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)

# Generate speech
audio = engine.generate(
    text="Hello world, this is vLLM inference.",
    voice="zoe",  # Optional voice identifier
    output_path="output.wav"  # Optional: save directly to file
)

👨‍🍳 Roadmap

Transformers.js support
vLLM support
Pretrained model release
Training and inference code release

🙏 Acknowledgements

We would like to thank the following projects and teams that made this work possible:

Orpheus TTS - For foundational TTS research and implementation
LiquidAI - For the LFM2 model architecture and pre-trained weights

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
assets		assets
notebook		notebook
vyvotts		vyvotts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VyvoTTS: LLM-Based Text-to-Speech Training Framework 🚀

✨ Features

📦 Installation

🚀 Quick Start

Dataset Preparation

Tokenizer Usage

Training

Fine-tuning

Pre-training

Inference

1. Transformers Inference (Standard)

2. Unsloth Inference (Memory Efficient)

3. HQQ Quantized Inference (4-bit)

4. vLLM Inference (Fastest)

👨‍🍳 Roadmap

🙏 Acknowledgements

📄 License

About

Uh oh!

Releases 1

Packages

Languages

License

Vyvo-Labs/VyvoTTS

Folders and files

Latest commit

History

Repository files navigation

VyvoTTS: LLM-Based Text-to-Speech Training Framework 🚀

✨ Features

📦 Installation

🚀 Quick Start

Dataset Preparation

Tokenizer Usage

Training

Fine-tuning

Pre-training

Inference

1. Transformers Inference (Standard)

2. Unsloth Inference (Memory Efficient)

3. HQQ Quantized Inference (4-bit)

4. vLLM Inference (Fastest)

👨‍🍳 Roadmap

🙏 Acknowledgements

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages