Skip to content

v0.16.0

Choose a tag to compare

@ZachNagengast ZachNagengast released this 03 Mar 02:49

Highlights

This release introduces TTSKit - a brand-new optional library that brings high-quality text-to-speech capabilities on-device using the latest Core ML features such as MLState and MLTensor for optimal inference on the Apple Neural Engine.

With this first release, we're launching Qwen3-TTS CustomVoice models 0.6b and 1.7b with instruction control, with more to come in future releases (including voice cloning).

Download, load, generate, and stream playback in 3 lines of code:

import TTSKit

let ttsKit = try await TTSKit()
try await ttsKit.play(text: "Hello from TTSKit!")

Key Features

  • Real-time adaptive streaming
    • Plays audio while it's still generating for the fastest time from text input to first audio buffer output
    • .auto mode adapts based on the inference speed of the device for consistent, smooth playback.
  • 9 built-in voices
  • 10 languages
  • Style instruction support (1.7B model only)
  • Automatic chunking for long-form inputs
  • Audio file exports in wav/m4a format with optional metadata.
  • Modular protocol-based architecture (6 swappable Core ML components) for easy customization and future model adoption.

See the new TTSKit section in the README.md for full API docs, model selection, and advanced usage.

CLI

Try it out with the following command:

swift run -c release whisperkit-cli tts --text "Hello from TTSKit" --play

Also available via Homebrew upon release:

brew install whisperkit-cli
whisperkit-cli tts --text "Hello from TTSKit" --play

Gives full control over speaker, language, model variant, style, temperature, chunking strategy, compute units, seed for reproducibility, and more.

Example App

Along with the CLI, we're also releasing a new example app for developers to reference when building TTSKit into their apps. It features real-time waveform visualization, model management, persistent audio file history with metadata, and multi-platform support. Here's a screenshot:
image

More info about running this app in the example's README.md

Architecture Changes

  • New shared ArgmaxCore target for common utilities
  • TTSKit ships as an optional product in the same Swift package (no breaking changes to existing WhisperKit code).
.target(
    name: "YourApp",
    dependencies: [
        "WhisperKit", // speech-to-text
        "TTSKit",     // text-to-speech
    ]
),
  • The repo will be renamed to reflect the new multi-kit architecture in an upcoming release.

Thank you to @naykutguven and @shura-v for the excellent improvements packaged with this release prior to TTSKit listed below 🚀

What's Changed

New Contributors

Full Changelog: v0.15.0...v0.16.0