
High-quality Text-to-Speech (TTS) and Voice Conversion (VC) nodes for ComfyUI, powered by Resemble AI's Chatterbox model.
Report Bug
·
Request Feature
Table of Contents
ComfyUI custom nodes for the powerful Resemble AI Chatterbox library. It enables seamless in-workflow Text-to-Speech and Voice Conversion, complete with deep integration into ComfyUI's model management system for efficient VRAM usage.
Note
- 1.2.0: This version has been deeply refactored for better performance, stability, and alignment with the ComfyUI codebase. All parameters have been unlocked.
- Long generation: No longer limited to 40 seconds.
- Chatterbox TTS Node: Synthesize speech from text with optional voice cloning from an audio prompt.
- Chatterbox Voice Conversion Node: Convert the voice in a source audio file to a target voice.
- Automatic Model Downloading: Models are automatically downloaded from Hugging Face on first use.
- Efficient VRAM Management: Full integration with ComfyUI's model patcher system to load models to GPU only when needed and offload them afterward.
- Detailed Generation Control: Fine-tune your audio output with parameters for speed, expressiveness, creativity, and quality.
- Accurate Progress Bars: Both console and UI progress bars reflect the true step-by-step generation process.
-
Install via ComfyUI Manager (Recommended):
- Search for
ComfyUI-Chatterbox
in the ComfyUI Manager and install it.
- Search for
-
Manual Installation:
- Clone this repository into your
ComfyUI/custom_nodes/
directory:git clone https://github.com/wildminder/ComfyUI-Chatterbox.git ComfyUI/custom_nodes/ComfyUI-Chatterbox
- Clone this repository into your
-
Install Dependencies:
- Navigate to the new directory and install the required packages:
cd ComfyUI/custom_nodes/ComfyUI-Chatterbox pip install -r requirements.txt
- Navigate to the new directory and install the required packages:
-
Model Management:
Important
For users of previous versions: This update changes the model directory. You must manually delete your old model folder to avoid conflicts:
Delete this folder: ComfyUI/models/chatterbox_tts/
The new version will automatically download models to the correct ComfyUI-standard directory: ComfyUI/models/tts/chatterbox/
.
- Restart ComfyUI.
After installation, you will find two new nodes:
- Chatterbox TTS 📢 under the
audio/generation
category. - Chatterbox Voice Conversion 🗣️ under the
audio/generation
category.
Load an example workflow from the workflow-examples/
directory in this repository to get started.
max_new_tokens
: Maximum number of audio tokens to generate. Acts as a failsafe against run-on generations. 25 tokens is approximately 1 second of audio. The model's hard limit is 4096 tokens (≈ 163 seconds).flow_cfg_scale
: CFG scale for the mel spectrogram decoder. Higher values increase adherence to the text content and speaker timbre but may reduce naturalness.exaggeration
: Controls the expressiveness and emotional intensity. Higher values lead to more exaggerated prosody.temperature
: Controls the randomness of the token sampling process. Higher values produce more diverse and creative speech, while lower values are more deterministic.cfg_weight
: Classifier-Free Guidance (CFG) weight for the token sampling process.repetition_penalty
: Penalizes repeated tokens to discourage monotonous or repetitive speech.1.0
means no penalty.min_p
/top_p
: Parameters for nucleus sampling, controlling the pool of tokens the model can choose from at each step.
n_timesteps
: Number of diffusion steps for the flow matching process. Higher values can improve quality but will take longer to generate.temperature
: Controls the randomness of the initial noise for the diffusion process.1.0
is standard. Lower values are more deterministic; higher values are more random.flow_cfg_scale
: CFG scale for the mel spectrogram decoder. Higher values increase adherence to the target voice's timbre but may reduce the naturalness of the speech prosody.target_voice_audio
: The audio file containing the target voice timbre. If not provided, the default voice from the selected model pack will be used.
- This node would not be possible without the incredible Chatterbox library by Resemble AI.
- README template adapted from the Best-README-Template.