GitHub - wildminder/ComfyUI-Chatterbox: ComfyUI Chatterbox TTS & Voice Conversion Node

ComfyUI Chatterbox

High-quality Text-to-Speech (TTS) and Voice Conversion (VC) nodes for ComfyUI, powered by Resemble AI's Chatterbox model.

Report Bug · Request Feature

Table of Contents

About The Project
- Major Update Notice
- Features
Getting Started
- Installation
Usage
- Node Parameters Explained
Roadmap
Contributing
Acknowledgments

About The Project

ComfyUI custom nodes for the powerful Resemble AI Chatterbox library. It enables seamless in-workflow Text-to-Speech and Voice Conversion, complete with deep integration into ComfyUI's model management system for efficient VRAM usage.

(back to top)

Major Update Notice

Note

1.2.0: This version has been deeply refactored for better performance, stability, and alignment with the ComfyUI codebase. All parameters have been unlocked.

(back to top)

Features

Long generation: No longer limited to 40 seconds.
Chatterbox TTS Node: Synthesize speech from text with optional voice cloning from an audio prompt.
Chatterbox Voice Conversion Node: Convert the voice in a source audio file to a target voice.
Automatic Model Downloading: Models are automatically downloaded from Hugging Face on first use.
Efficient VRAM Management: Full integration with ComfyUI's model patcher system to load models to GPU only when needed and offload them afterward.
Detailed Generation Control: Fine-tune your audio output with parameters for speed, expressiveness, creativity, and quality.
Accurate Progress Bars: Both console and UI progress bars reflect the true step-by-step generation process.

(back to top)

Getting Started

Installation

Install via ComfyUI Manager (Recommended):
- Search for ComfyUI-Chatterbox in the ComfyUI Manager and install it.

Manual Installation:

Clone this repository into your ComfyUI/custom_nodes/ directory:

git clone https://github.com/wildminder/ComfyUI-Chatterbox.git ComfyUI/custom_nodes/ComfyUI-Chatterbox

Install Dependencies:
- Navigate to the new directory and install the required packages:
```
cd ComfyUI/custom_nodes/ComfyUI-Chatterbox
pip install -r requirements.txt
```
Model Management:

Important

For users of previous versions: This update changes the model directory. You must manually delete your old model folder to avoid conflicts:

Delete this folder: ComfyUI/models/chatterbox_tts/

The new version will automatically download models to the correct ComfyUI-standard directory: ComfyUI/models/tts/chatterbox/.

Restart ComfyUI.

(back to top)

Usage

After installation, you will find two new nodes:

Chatterbox TTS 📢 under the audio/generation category.
Chatterbox Voice Conversion 🗣️ under the audio/generation category.

Load an example workflow from the workflow-examples/ directory in this repository to get started.

Node Parameters Explained

Chatterbox TTS 📢 Parameters

max_new_tokens: Maximum number of audio tokens to generate. Acts as a failsafe against run-on generations. 25 tokens is approximately 1 second of audio. The model's hard limit is 4096 tokens (≈ 163 seconds).
flow_cfg_scale: CFG scale for the mel spectrogram decoder. Higher values increase adherence to the text content and speaker timbre but may reduce naturalness.
exaggeration: Controls the expressiveness and emotional intensity. Higher values lead to more exaggerated prosody.
temperature: Controls the randomness of the token sampling process. Higher values produce more diverse and creative speech, while lower values are more deterministic.
cfg_weight: Classifier-Free Guidance (CFG) weight for the token sampling process.
repetition_penalty: Penalizes repeated tokens to discourage monotonous or repetitive speech. 1.0 means no penalty.
min_p / top_p: Parameters for nucleus sampling, controlling the pool of tokens the model can choose from at each step.

Chatterbox Voice Conversion 🗣️ Parameters

n_timesteps: Number of diffusion steps for the flow matching process. Higher values can improve quality but will take longer to generate.
temperature: Controls the randomness of the initial noise for the diffusion process. 1.0 is standard. Lower values are more deterministic; higher values are more random.
flow_cfg_scale: CFG scale for the mel spectrogram decoder. Higher values increase adherence to the target voice's timbre but may reduce the naturalness of the speech prosody.
target_voice_audio: The audio file containing the target voice timbre. If not provided, the default voice from the selected model pack will be used.

(back to top)

Acknowledgments

This node would not be possible without the incredible Chatterbox library by Resemble AI.
README template adapted from the Best-README-Template.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
modules		modules
src/chatterbox		src/chatterbox
workflow-examples		workflow-examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ComfyUI Chatterbox

About The Project

Major Update Notice

Features

Getting Started

Installation

Usage

Node Parameters Explained

Chatterbox TTS 📢 Parameters

Chatterbox Voice Conversion 🗣️ Parameters

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

wildminder/ComfyUI-Chatterbox

Folders and files

Latest commit

History

Repository files navigation

ComfyUI Chatterbox

About The Project

Major Update Notice

Features

Getting Started

Installation

Usage

Node Parameters Explained

Chatterbox TTS 📢 Parameters

Chatterbox Voice Conversion 🗣️ Parameters

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages