This library was developed by the VyvoTTS team. A Text-to-Speech (TTS) training and inference framework built on top of the LLM model.
- Pre-training: Train LLM models from scratch with custom datasets
- Fine-tuning: Adapt pre-trained models for specific TTS tasks
- LoRA Adaptation: Memory-efficient fine-tuning using Low-Rank Adaptation
- Voice Cloning: Clone voices using advanced neural techniques
- Multi-GPU Support: Distributed training with accelerate
uv venv --python 3.10
uv pip install -r requirements.txt
VyvoTTS provides a unified tokenizer that works with both Qwen3 and LFM2 models. The tokenizer reads configuration from YAML files for flexibility.
from vyvotts.audio_tokenizer import process_dataset
# For Qwen3
process_dataset(
original_dataset="MrDragonFox/Elise",
output_dataset="username/dataset-name",
model_type="qwen3",
text_field="text"
)
# For LFM2
process_dataset(
original_dataset="MrDragonFox/Elise",
output_dataset="username/dataset-name",
model_type="lfm2",
text_field="text"
)
Configure your fine-tuning parameters in vyvotts/configs/lfm2_ft.yaml
and run:
accelerate launch --config_file vyvotts/configs/accelerate_finetune.yaml vyvotts/train.py
💻 For lower-end GPUs (6GB+):** Use the Unsloth FP8/FP4 training notebook:
uv pip install jupyter notebook
uv jupyter notebook notebook/vyvotts-lfm2-train.ipynb
Configure your pre-training parameters in vyvotts/configs/lfm2_config.yaml
and run:
accelerate launch --config_file vyvotts/configs/accelerate_pretrain.yaml vyvotts/train.py
VyvoTTS provides multiple inference backends optimized for different use cases:
Standard inference using HuggingFace Transformers with full precision.
from vyvotts.inference.transformers_inference import VyvoTTSTransformersInference
# Initialize engine
engine = VyvoTTSTransformersInference(
model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)
# Generate speech
audio, timing_info = engine.generate(
text="Hello, this is a test of the text to speech system.",
voice=None, # Optional: specify voice name
output_path="output.wav", # Optional: save directly to file
max_new_tokens=1200,
temperature=0.6,
top_p=0.95
)
Optimized inference with 4-bit/8-bit quantization support.
from vyvotts.inference.unsloth_inference import VyvoTTSUnslothInference
# Initialize engine with 4-bit quantization
engine = VyvoTTSUnslothInference(
model_name="Vyvo/VyvoTTS-v2-Neuvillette",
load_in_4bit=False, # Use 4-bit quantization for lower memory
load_in_8bit=False,
)
# Generate and save audio
audio = engine.generate(
text="Hey there, my name is Elise.",
voice=None,
output_path="output.wav", # Optional: save directly to file
max_new_tokens=1200,
temperature=0.7
)
High-quality 4-bit quantization with gemlite backend for faster inference.
from vyvotts.inference.transformers_hqq_inference import VyvoTTSHQQInference
# Initialize engine with HQQ quantization
engine = VyvoTTSHQQInference(
model_name="Vyvo/VyvoTTS-LFM2-Neuvillette",
nbits=8, # 8/4/2/1-bit quantization
group_size=64
)
# Generate speech
audio, timing_info = engine.generate(
text="Hello world, this is HQQ inference.",
voice=None,
output_path="output.wav", # Optional: save directly to file
max_new_tokens=1200,
temperature=0.6
)
Production-ready inference with vLLM for maximum throughput.
from vyvotts.inference.vllm_inference import VyvoTTSInference
# Initialize engine
engine = VyvoTTSInference(
model_name="Vyvo/VyvoTTS-LFM2-Neuvillette"
)
# Generate speech
audio = engine.generate(
text="Hello world, this is vLLM inference.",
voice="zoe", # Optional voice identifier
output_path="output.wav" # Optional: save directly to file
)
- Transformers.js support
- vLLM support
- Pretrained model release
- Training and inference code release
We would like to thank the following projects and teams that made this work possible:
- Orpheus TTS - For foundational TTS research and implementation
- LiquidAI - For the LFM2 model architecture and pre-trained weights
This project is licensed under the MIT License - see the LICENSE file for details.