Skip to content

Vyvo-Labs/VyvoTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VyvoTTS: LLM-Based Text-to-Speech Training Framework 🚀

VyvoTTS Logo

This library was developed by the VyvoTTS team. A Text-to-Speech (TTS) training and inference framework built on top of the LLM model.

✨ Features

  • Pre-training: Train LLM models from scratch with custom datasets
  • Fine-tuning: Adapt pre-trained models for specific TTS tasks
  • LoRA Adaptation: Memory-efficient fine-tuning using Low-Rank Adaptation
  • Voice Cloning: Clone voices using advanced neural techniques
  • Multi-GPU Support: Distributed training with accelerate

📦 Installation

uv venv --python 3.10
uv pip install -r requirements.txt

🚀 Quick Start

Dataset Preparation

VyvoTTS provides a unified tokenizer that works with both Qwen3 and LFM2 models. The tokenizer reads configuration from YAML files for flexibility.

Tokenizer Usage

from vyvotts.audio_tokenizer import process_dataset

# For Qwen3
process_dataset(
    original_dataset="MrDragonFox/Elise",
    output_dataset="username/dataset-name",
    model_type="qwen3",
    text_field="text"
)

# For LFM2
process_dataset(
    original_dataset="MrDragonFox/Elise",
    output_dataset="username/dataset-name",
    model_type="lfm2",
    text_field="text"
)

Training

Fine-tuning

⚠️ GPU Requirements:** 30GB VRAM minimum required for fine-tuning

Configure your fine-tuning parameters in vyvotts/configs/lfm2_ft.yaml and run:

accelerate launch --config_file vyvotts/configs/accelerate_finetune.yaml vyvotts/train.py

💻 For lower-end GPUs (6GB+):** Use the Unsloth FP8/FP4 training notebook:

uv pip install jupyter notebook
uv jupyter notebook notebook/vyvotts-lfm2-train.ipynb

Pre-training

Configure your pre-training parameters in vyvotts/configs/lfm2_config.yaml and run:

accelerate launch --config_file vyvotts/configs/accelerate_pretrain.yaml vyvotts/train.py

Inference

VyvoTTS provides multiple inference backends optimized for different use cases:

1. Transformers Inference (Standard)

Standard inference using HuggingFace Transformers with full precision.

from vyvotts.inference.transformers_inference import VyvoTTSTransformersInference

# Initialize engine
engine = VyvoTTSTransformersInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)

# Generate speech
audio, timing_info = engine.generate(
    text="Hello, this is a test of the text to speech system.",
    voice=None,  # Optional: specify voice name
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.6,
    top_p=0.95
)

2. Unsloth Inference (Memory Efficient)

Optimized inference with 4-bit/8-bit quantization support.

from vyvotts.inference.unsloth_inference import VyvoTTSUnslothInference

# Initialize engine with 4-bit quantization
engine = VyvoTTSUnslothInference(
    model_name="Vyvo/VyvoTTS-v2-Neuvillette",
    load_in_4bit=False,  # Use 4-bit quantization for lower memory
    load_in_8bit=False,
)

# Generate and save audio
audio = engine.generate(
    text="Hey there, my name is Elise.",
    voice=None,
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.7
)

3. HQQ Quantized Inference (4-bit)

High-quality 4-bit quantization with gemlite backend for faster inference.

from vyvotts.inference.transformers_hqq_inference import VyvoTTSHQQInference

# Initialize engine with HQQ quantization
engine = VyvoTTSHQQInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette",
    nbits=8,  # 8/4/2/1-bit quantization
    group_size=64
)

# Generate speech
audio, timing_info = engine.generate(
    text="Hello world, this is HQQ inference.",
    voice=None,
    output_path="output.wav",  # Optional: save directly to file
    max_new_tokens=1200,
    temperature=0.6
)

4. vLLM Inference (Fastest)

Production-ready inference with vLLM for maximum throughput.

from vyvotts.inference.vllm_inference import VyvoTTSInference

# Initialize engine
engine = VyvoTTSInference(
    model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)

# Generate speech
audio = engine.generate(
    text="Hello world, this is vLLM inference.",
    voice="zoe",  # Optional voice identifier
    output_path="output.wav"  # Optional: save directly to file
)

👨‍🍳 Roadmap

  • Transformers.js support
  • vLLM support
  • Pretrained model release
  • Training and inference code release

🙏 Acknowledgements

We would like to thank the following projects and teams that made this work possible:

  • Orpheus TTS - For foundational TTS research and implementation
  • LiquidAI - For the LFM2 model architecture and pre-trained weights

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published