Qwen3 ASR -- Rust CLI tools

Pure Rust implementation of Qwen3-ASR automatic speech recognition. The project builds a cross-platform CLI tool suitable for agentic skills for AI agents and bots.

asr generates text from an input audio file (supports most codex and file formats)

Supports two backends: libtorch (via the tch crate, cross-platform with optional CUDA) and MLX (Apple Silicon native via Metal GPU). Loads model weights directly from safetensors files and re-implements the complete neural network forward pass in Rust.

Learn more:

A Rust implementation / CLI for Qwen3's TTS (Text-to-Speech or speech synthesis) models
An OpenAI compatible API server for audio / speech
An OpenClaw SKILL for voice recognition. Copy and paste to your lobster to install it

Quick Start

The install script automatically detects your platform (macOS/Linux, CPU/CUDA GPU), downloads the correct release binary, model weights, and a sample audio file:

curl -sSf https://raw.githubusercontent.com/second-state/qwen3_asr_rs/main/install.sh | bash

The installer will prompt you to choose a model size (0.6B recommended) and, on Linux with an NVIDIA GPU, whether to use CUDA or CPU.

Once complete, run your first transcription:

cd qwen3_asr_rs
./asr ./Qwen3-ASR-0.6B sample.wav

Output:

Language: English
Text: Thank you for your contribution to the most recent issue of Computer.

Architecture

The implementation ports the Qwen3-ASR encoder-decoder architecture from PyTorch/Transformers to Rust with libtorch (via the tch crate):

Audio Encoder (Whisper-style): 3x Conv2d downsampling → sinusoidal positional embeddings → 18 transformer encoder layers → output projection (896 → 1024)
Text Decoder (Qwen3): 28 transformer decoder layers with Grouped Query Attention (16 Q heads / 8 KV heads), QK-normalization, MRoPE (Multimodal Rotary Position Embeddings), and SwiGLU MLP
Audio preprocessing: FFmpeg decodes any audio format → resampled to mono 16kHz f32 → 128-bin log-mel spectrogram (Whisper-style)

Supported Models

Model	Parameters	HuggingFace
Qwen3-ASR-0.6B	0.6B	Qwen/Qwen3-ASR-0.6B
Qwen3-ASR-1.7B	1.7B	Qwen/Qwen3-ASR-1.7B

Usage

# Basic transcription (auto-detect language)
asr ./Qwen3-ASR-0.6B input.wav

# Force language
asr ./Qwen3-ASR-0.6B input.wav chinese
asr ./Qwen3-ASR-0.6B input.wav english

# Enable debug logging
RUST_LOG=debug asr ./Qwen3-ASR-0.6B input.wav

Output Format

Language: Chinese
Text: 你好世界

Supported Languages

Qwen3-ASR supports 30 languages: Chinese, English, Cantonese, Arabic, German, French, Spanish, Portuguese, Indonesian, Italian, Korean, Russian, Thai, Vietnamese, Japanese, Turkish, Hindi, Malay, Dutch, Swedish, Danish, Finnish, Polish, Czech, Filipino, Persian, Greek, Romanian, Hungarian, Macedonian.

Build from Source

Prerequisites

Download model weights and generate the tokenizer:

pip install huggingface_hub transformers

huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir Qwen3-ASR-0.6B

python -c "
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained('Qwen3-ASR-0.6B', trust_remote_code=True)
tok.backend_tokenizer.save('Qwen3-ASR-0.6B/tokenizer.json')
"

Build for macOS (MLX)

Install dependencies:

brew install ffmpeg

Build:

git submodule update --init --recursive
cargo build --release --no-default-features --features mlx,build-ffmpeg

Build for Linux (libtorch)

Download and extract libtorch for your platform from libtorch-releases:

# Linux x86_64 (CPU)
curl -LO https://github.com/second-state/libtorch-releases/releases/download/v2.7.1/libtorch-cxx11-abi-x86_64-2.7.1.tar.gz
tar xzf libtorch-cxx11-abi-x86_64-2.7.1.tar.gz

# Linux x86_64 (CUDA 12.6)
curl -LO https://github.com/second-state/libtorch-releases/releases/download/v2.7.1/libtorch-cxx11-abi-x86_64-cuda12.6-2.7.1.tar.gz
tar xzf libtorch-cxx11-abi-x86_64-cuda12.6-2.7.1.tar.gz

# Linux ARM64 (CPU)
curl -LO https://github.com/second-state/libtorch-releases/releases/download/v2.7.1/libtorch-cxx11-abi-aarch64-2.7.1.tar.gz
tar xzf libtorch-cxx11-abi-aarch64-2.7.1.tar.gz

# Linux ARM64 (CUDA 12.6 / Jetson)
curl -LO https://github.com/second-state/libtorch-releases/releases/download/v2.7.1/libtorch-cxx11-abi-aarch64-cuda12.6-2.7.1.tar.gz
tar xzf libtorch-cxx11-abi-aarch64-cuda12.6-2.7.1.tar.gz

Set environment variables:

export LIBTORCH=$(pwd)/libtorch
export LIBTORCH_BYPASS_VERSION_CHECK=1

Install dependencies and build:

sudo apt-get install -y nasm pkg-config
cargo build --release --features build-ffmpeg

Project Structure

src/
├── main.rs            # CLI binary entry point
├── lib.rs             # Library module declarations
├── tensor.rs          # Unified Tensor abstraction (tch/MLX backend)
├── config.rs          # Model configuration (from config.json)
├── error.rs           # Error types
├── audio.rs           # FFmpeg-based audio loading and format conversion
├── mel.rs             # Whisper-style mel spectrogram feature extraction
├── weights.rs         # Safetensors weight loading (bf16 → f32 conversion)
├── layers.rs          # Neural network building blocks (LayerNorm, RMSNorm,
│                      #   attention, MLP, MRoPE, etc.)
├── audio_encoder.rs   # Whisper-style audio encoder (Conv2d + Transformer)
├── text_decoder.rs    # Qwen3 text decoder with KV cache
├── tokenizer.rs       # HuggingFace tokenizer wrapper
├── inference.rs       # End-to-end ASR inference pipeline
└── backend/
    └── mlx/           # Apple MLX backend (Metal GPU)
        ├── ffi.rs     # Raw C FFI bindings to mlx-c
        ├── array.rs   # Safe RAII MlxArray wrapper
        ├── ops.rs     # Safe operation wrappers
        ├── io.rs      # Safetensors loading via mlx-c
        ├── signal.rs  # STFT, mel spectrogram signal processing
        └── stream.rs  # Device/stream management

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
mlx-c @ a1290d2		mlx-c @ a1290d2
skills		skills
src		src
test_audio		test_audio
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3 ASR -- Rust CLI tools

Quick Start

Architecture

Supported Models

Usage

Output Format

Supported Languages

Build from Source

Prerequisites

Build for macOS (MLX)

Build for Linux (libtorch)

Project Structure

License

About

Uh oh!

Releases 10

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qwen3 ASR -- Rust CLI tools

Quick Start

Architecture

Supported Models

Usage

Output Format

Supported Languages

Build from Source

Prerequisites

Build for macOS (MLX)

Build for Linux (libtorch)

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Contributors

Uh oh!

Languages