Release v0.16.0 · argmaxinc/argmax-oss-swift

Highlights

This release introduces TTSKit - a brand-new optional library that brings high-quality text-to-speech capabilities on-device using the latest Core ML features such as MLState and MLTensor for optimal inference on the Apple Neural Engine.

With this first release, we're launching Qwen3-TTS CustomVoice models 0.6b and 1.7b with instruction control, with more to come in future releases (including voice cloning).

Download, load, generate, and stream playback in 3 lines of code:

import TTSKit

let ttsKit = try await TTSKit()
try await ttsKit.play(text: "Hello from TTSKit!")

Key Features

Real-time adaptive streaming
- Plays audio while it's still generating for the fastest time from text input to first audio buffer output
- .auto mode adapts based on the inference speed of the device for consistent, smooth playback.
9 built-in voices
10 languages
Style instruction support (1.7B model only)
Automatic chunking for long-form inputs
Audio file exports in wav/m4a format with optional metadata.
Modular protocol-based architecture (6 swappable Core ML components) for easy customization and future model adoption.

See the new TTSKit section in the README.md for full API docs, model selection, and advanced usage.

CLI

Try it out with the following command:

swift run -c release whisperkit-cli tts --text "Hello from TTSKit" --play

Also available via Homebrew upon release:

brew install whisperkit-cli
whisperkit-cli tts --text "Hello from TTSKit" --play

Gives full control over speaker, language, model variant, style, temperature, chunking strategy, compute units, seed for reproducibility, and more.

Example App

Along with the CLI, we're also releasing a new example app for developers to reference when building TTSKit into their apps. It features real-time waveform visualization, model management, persistent audio file history with metadata, and multi-platform support. Here's a screenshot:

More info about running this app in the example's README.md

Architecture Changes

New shared ArgmaxCore target for common utilities
TTSKit ships as an optional product in the same Swift package (no breaking changes to existing WhisperKit code).

.target(
    name: "YourApp",
    dependencies: [
        "WhisperKit", // speech-to-text
        "TTSKit",     // text-to-speech
    ]
),

The repo will be renamed to reflect the new multi-kit architecture in an upcoming release.

Thank you to @naykutguven and @shura-v for the excellent improvements packaged with this release prior to TTSKit listed below 🚀

What's Changed

Update doc for prewarm by @chen-argmax in #387
Pin Xcode version as 26 for Github workflows by @naykutguven in #386
AudioProcessor: fix teardown to avoid StartIO/thread warnings on some Bluetooth devices by @shura-v in #402
Mute-style input suppression without pausing AVAudioEngine by @shura-v in #401
Add TTSKit with Qwen3-TTS support by @ZachNagengast in #425

New Contributors

@shura-v made their first contribution in #402

Full Changelog: v0.15.0...v0.16.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.16.0

Choose a tag to compare

Sorry, something went wrong.