Qwen3-TTS Streaming

Streaming inference implementation for Qwen3-TTS that the official repo doesn't provide.

The official team mentions "Extreme Low-Latency Streaming Generation" in their paper and marketing, but the actual streaming code was never released - they point users to vLLM-Omni, which still doesn't support online serving.

This fork adds real streaming generation directly to the qwen-tts package.

In addition to real streaming, this fork includes an ~6x inference speedup vs upstream qwen-tts - both for non-streaming generation and streaming mode.

What's Added

stream_generate_pcm() - real-time PCM audio streaming
stream_generate_voice_clone() - streaming with voice cloning

Benchmark (RTX 5090)

Non-streaming (full inference)

Streaming

Usage

See examples/

Installation (python 3.12)

Note: torch versions differ between Linux/Windows due to available flash_attn prebuilt wheels.

1. Install SOX

Linux:

sudo apt install sox libsox-fmt-all

Windows:

# Download from https://sourceforge.net/projects/sox/ and add to PATH !!

2. Create environment

conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

3. Install dependencies

Linux:

pip install torch==2.9.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.6.8/flash_attn-2.8.3%2Bcu130torch2.9-cp312-cp312-linux_x86_64.whl

Windows:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu130
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.12/flash_attn-2.8.3%2Bcu130torch2.10-cp312-cp312-win_amd64.whl
pip install -U "triton-windows<3.7"

4. Install package

git clone https://github.com/dffdeeq/Qwen3-TTS-streaming.git
cd Qwen3-TTS-streaming
pip install -e .

Parameters

Parameter	Default	Description
`emit_every_frames`	4	Emit audio every N frames (~0.33s at 12Hz)
`decode_window_frames`	80	Decoder context window

Why This Exists

From official Qwen3-TTS README:

Now only offline inference is supported. Online serving will be supported later.

This fork provides streaming now, without waiting for vLLM-Omni updates.

Based on QwenLM/Qwen3-TTS

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
assets		assets
examples		examples
finetuning		finetuning
qwen_tts		qwen_tts
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
kuklina-1.wav		kuklina-1.wav
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS Streaming

What's Added

Benchmark (RTX 5090)

Non-streaming (full inference)

Streaming

Usage

Installation (python 3.12)

1. Install SOX

2. Create environment

3. Install dependencies

4. Install package

Parameters

Why This Exists

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS Streaming

What's Added

Benchmark (RTX 5090)

Non-streaming (full inference)

Streaming

Usage

Installation (python 3.12)

1. Install SOX

2. Create environment

3. Install dependencies

4. Install package

Parameters

Why This Exists

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors