Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.
Currently in developer preview (Original author note)
- Ultra-lightweight: Model size less than 25MB
- CPU-optimized: Runs without GPU on any device
- High-quality voices: Several premium voice options available
- Fast inference: Optimized for real-time speech synthesis
- CLI
- No hard espeak dependency
- Option of using any phonemizer
# Build package using Nix package manager
nix build
# Create development environment using Nix package manager
nix develop
let model = crate::KittenModel::model_builtin(crate::KittenVoice::default());
let inference = model
.unwrap()
.generate("This high quality TTS model works without a GPU".to_string());
let (waveform, _) = inference.unwrap();
let wav = crate::wav::save_array1_f32_as_wav(&waveform, "out/out.wav", None);