Skip to content

Commit 16d329a

Browse files
authored
Merge pull request #171 from gggekov/Conformer_PTQ_ExecuTorch
Add Post-Training Quantization with ExecuTorch of the Conformer NN
2 parents 6474aac + 613cb7b commit 16d329a

File tree

8 files changed

+449
-59
lines changed

8 files changed

+449
-59
lines changed
Lines changed: 5 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,60 +1,6 @@
1-
# Conformer-S Model Training
1+
# Overview
2+
Conformer is a popular Transformer based speech recognition network, suitable for embedded devices. This repository contains instructions how to train and quantize a [Conformer](https://github.com/sooftware/conformer) speech recognition model.
3+
For the quantization of the model, we use ExecuTorch with the Arm® Ethos™-U quantizer.
24

3-
This repository provides an example of training the **Conformer-S** model on the **LibriSpeech** dataset.
4-
5-
## External dependencies
6-
- **Model**: https://github.com/sooftware/conformer implementation of Conformer-S
7-
- **Dataset**: LibriSpeech (downloaded via `torchaudio`) - used both to generate Tokenizer and Conformer model
8-
- **Tokenizer**: Generated using https://github.com/google/sentencepiece/
9-
- **Python Dependencies**: Python packages listed in **requirements.txt**.
10-
11-
## Environment description
12-
- AWS g5.24xlarge instance
13-
- Python version 3.12.7
14-
- AWS AMI - Deep Learning OSS Nvidia Driver AMI GPU PyTorch (Ubuntu 22.04)
15-
16-
## Setup
17-
1) Make sure the Conformer repository is cloned in the same directory as the training script:
18-
```angular2html
19-
git clone https://github.com/sooftware/conformer.git
20-
```
21-
2) Generate SentencePiece Tokenizer
22-
- More information on what is SentencePiece tokenizer and how to use it can be found at https://github.com/google/sentencepiece?tab=readme-ov-file#overview
23-
- Generate the tokenizer using the following command
24-
```angular2html
25-
!python build_sp_128_librispeech.py \
26-
--root ./data \
27-
--subset train-clean-100 \
28-
--output_dir ./tokenizer_out \
29-
--vocab_size 128 \
30-
--model_type unigram \
31-
--lowercase \
32-
--disable_bos_eos \
33-
--pad_id -1
34-
```
35-
- Pass the tokenizer path to the training script via the --sp-model argument
36-
3) create an empty data folder in the same directory as the training script
37-
## Training
38-
Run the following command to start training:
39-
```angular2html
40-
!CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
41-
--train-sets "train-clean-100,train-clean-360,train-other-500" \
42-
--valid-set "dev-clean" \
43-
--epochs 160 \
44-
--batch-size 96 \
45-
--lr=0.0005 \
46-
--betas 0.9,0.98 \
47-
--weight-decay 1e-6 \
48-
--warmup-epochs 2.0 \
49-
--grad-clip 5 \
50-
--root "data" \
51-
--save-dir "checkpoints" \
52-
--num-workers=32 \
53-
--accum-steps 16 \
54-
2>&1 | tee train_log.txt
55-
```
56-
## Notes and recommendations
57-
- Hyperparameter tuning and active monitoring (“model babysitting”) are strongly recommended to achieve optimal performance
58-
- We should be able to reach WER in the range of 6%-7% on the test clean dataset
59-
- Ckeckpoints will be saved under the checkpoints/ directory
60-
- Logs are written to train_log.txt for convenience
5+
To train the model, follow the instructions in the `training` folder.
6+
To quantize the model, follow the instructions in the `post_training_quantization` folder.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
2+
# Requirements
3+
In order to run the training and quantization scripts, you would need to:
4+
1) Install the [sooftware conformer model](https://github.com/sooftware/conformer) as a pip package.
5+
2) Download the [LibriSpeech dataset from torchaudio](https://docs.pytorch.org/audio/stable/generated/torchaudio.datasets.LIBRISPEECH.html) and tokenizer from the `training` folder.
6+
3) For the post-training quantization, you need to install ExecuTorch from source. We recommend you to install ExecuTorch from a python 3.10 virtual environmental variable.
7+
Clone the [ExecuTorch repository](https://github.com/pytorch/executorch/), checkout the `release/1.0` branch and run `./install_executorch.sh` from the root folder. You also need to install the
8+
Ethos-U backend dependencies within ExecuTorch, you can do that by running `./examples/arm/setup.sh --i-agree-to-the-contained-eula`.
9+
You can find detailed instructions about installing ExecuTorch from source [in the official documentation](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source). The
10+
detailed instructions for setting up the Arm backend are in the [examples/arm folder](https://github.com/pytorch/executorch/tree/main/examples/arm#example-workflow). The key commands are:
11+
```
12+
$ git clone [email protected]:pytorch/executorch.git
13+
$ git checkout git release/1.0
14+
$ git submodule sync && git submodule update --init --recursive
15+
$ ./install_executorch.sh
16+
$ ./examples/arm/setup.sh --i-agree-to-the-contained-eula
17+
```
18+
19+
## Torchcodec
20+
We use `torchaudio` for the pre-processing of the LibriSpeech dataset. Since [August 2025](https://github.com/pytorch/audio/commit/93f582ca5001132bfcdb115f476b73ae60e6ef8a), torchaudio requires torchcodec.
21+
You need to install the correct version of the `torchcodec` in order to be able to load audio samples with torchaudio. When you install ExecuTorch from the release/1.0 branch, you will get torchaudio 2.8.0.dev20250906 :
22+
```
23+
$ pip freeze | grep torch
24+
torch==2.9.0.dev20250906
25+
torchaudio==2.8.0.dev20250906
26+
torchvision==0.24.0.dev20250906
27+
....
28+
```
29+
Manually install the torchcodec package corresponding to the minor version of torchaudio. In this example, you need to install minor version dev20250906 of torchcodec.
30+
```
31+
$ pip install --pre --no-deps --index-url https://download.pytorch.org/whl/nightly/cpu \
32+
"torchcodec==0.7.0.dev20250906"
33+
```
34+
As per the [torchcodec documentation](https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec), you need to ensure you have a version of `ffmpeg` smaller than 8.
35+
On a Mac OS, you also need to export the `DYLD_FALLBACK_LIBRARY_PATH` environment variable to the location of the torchcodec binaries.
36+
```
37+
export DYLD_FALLBACK_LIBRARY_PATH="/opt/homebrew/opt/ffmpeg@7/lib:/opt/homebrew/lib"
38+
```
39+
40+
You can now use the latest torchaudio and load audio recordings with torchcodec.
41+
42+
# Quantization
43+
44+
The `ptq_evaluate_conformer_10M.py` script provides a way to quantize a Conformer speech recognition network, evaluate its accuracy on the LibriSpeech dataset and generate an ExecuTorch pte for the Ethos-U NPU.
45+
We assume you have obtained a trained checkpoint from the Training section. Run the `ptq_evaluate_10M_model.py` script to obtain a pte file that will be deployed on device in the following way:
46+
`$ python ptq_evaluate_conformer_10M.py --root <path to the LibriSpeech dataset> --dataset <dataset, usually test-clean> --checkpoint <path to checkpoint with trained weights> --sp-model <path to the tokenizer>`
47+
48+
We obtain ~8% Word Error Rate when evaluating the quantized model on the test-clean dataset.

0 commit comments

Comments
 (0)