A graphical interface for the Kokoro-82M text-to-speech model, providing an easy way to generate high-quality speech with various voice options.
- Multiple voice options (American/British English)
- Real-time audio generation
- Audio playback controls (play/pause/stop)
- Save generated audio as WAV files
- Progress indicators during generation
- Automatic model download and setup
- Windows 10/11
- Python 3.8+
- eSpeak NG installed at
C:\Program Files\eSpeak NG
- NVIDIA GPU with CUDA support (optional but recommended)
-
Install eSpeak NG:
- Download 1.51 64x.msi version from eSpeak NG releases
- Install to
C:\Program Files\eSpeak NG
-
Install Python dependencies:
pip install torch soundfile pygame phonemizer
-
Clone this repository:
git clone https://github.com/AmitTzah/TTS-kokoro cd tts-gui
-
Run the setup script:
python local-tts-setup.py
-
Launch the GUI:
python tts-gui.pyw
-
Select a voice from the dropdown menu
-
Enter text in the input box
-
Click "Generate Audio" to create speech
-
Use the playback controls to listen to the generated audio
-
Save the audio using the "Save" button
The GUI provides 10 unique voices:
- af (Default - 50/50 mix of Bella & Sarah)
- af_bella
- af_nicole
- af_sarah
- af_sky
- bf_emma
- bf_isabella
- bm_george
- bm_lewis
- Ensure eSpeak NG is installed at
C:\Program Files\eSpeak NG
- Verify the following files exist:
C:\Program Files\eSpeak NG\libespeak-ng.dll
C:\Program Files\eSpeak NG\espeak-ng.exe
If model files fail to download:
- Check your internet connection
- Try running the setup script again:
python local-tts-setup.py
- If you have an NVIDIA GPU, ensure CUDA is properly installed
- The GUI will automatically use CUDA if available
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
The Kokoro-82M model is licensed under Apache 2.0. eSpeak NG is licensed under GPLv3.