Arm-Examples
diff --git a/‎pytorch-conformer-train-quantize/README.md‎
Lines changed: 5 additions & 59 deletions b/‎pytorch-conformer-train-quantize/README.md‎
Lines changed: 5 additions & 59 deletions
diff --git a/‎pytorch-conformer-train-quantize/post_training_quantization/README.md‎
Lines changed: 48 additions & 0 deletions b/‎pytorch-conformer-train-quantize/post_training_quantization/README.md‎
Lines changed: 48 additions & 0 deletions
@@ -1,60 +1,6 @@
-# Conformer-S Model Training
+# Overview
+Conformer is a popular Transformer based speech recognition network, suitable for embedded devices. This repository contains instructions how to train and quantize a [Conformer](https://github.com/sooftware/conformer) speech recognition model.
+For the quantization of the model, we use ExecuTorch with the Arm&reg; Ethos&trade;-U quantizer. 
 
-This repository provides an example of training the **Conformer-S** model on the **LibriSpeech** dataset.
-
-## External dependencies
-- **Model**: https://github.com/sooftware/conformer implementation of Conformer-S
-- **Dataset**: LibriSpeech (downloaded via `torchaudio`) - used both to generate Tokenizer and Conformer model
-- **Tokenizer**: Generated using https://github.com/google/sentencepiece/
-- **Python Dependencies**: Python packages listed in **requirements.txt**.
-
-## Environment description
-- AWS g5.24xlarge instance 
-- Python version 3.12.7
-- AWS AMI - Deep Learning OSS Nvidia Driver AMI GPU PyTorch (Ubuntu 22.04)
-
-## Setup
-1) Make sure the Conformer repository is cloned in the same directory as the training script:
-```angular2html
-git clone https://github.com/sooftware/conformer.git
-```
-2) Generate SentencePiece Tokenizer
-- More information on what is SentencePiece tokenizer and how to use it can be found at https://github.com/google/sentencepiece?tab=readme-ov-file#overview
-- Generate the tokenizer using the following command
-```angular2html
-!python build_sp_128_librispeech.py \
-  --root ./data \
-  --subset train-clean-100 \
-  --output_dir ./tokenizer_out \
-  --vocab_size 128 \
-  --model_type unigram \
-  --lowercase \
-  --disable_bos_eos \
-  --pad_id -1
-```
-- Pass the tokenizer path to the training script via the --sp-model argument
-3) create an empty data folder in the same directory as the training script
-## Training
-Run the following command to start training:
-```angular2html
-!CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py \
---train-sets "train-clean-100,train-clean-360,train-other-500" \
---valid-set "dev-clean" \
---epochs 160 \
---batch-size 96 \
---lr=0.0005 \
---betas 0.9,0.98 \
---weight-decay 1e-6 \
---warmup-epochs 2.0 \
---grad-clip 5 \
---root "data" \
---save-dir "checkpoints" \
---num-workers=32 \
---accum-steps 16 \
-2>&1 | tee train_log.txt
-```
-## Notes and recommendations
-- Hyperparameter tuning and active monitoring (“model babysitting”) are strongly recommended to achieve optimal performance
-- We should be able to reach WER in the range of 6%-7% on the test clean dataset
-- Ckeckpoints will be saved under the checkpoints/ directory
-- Logs are written to train_log.txt for convenience
+To train the model, follow the instructions in the `training` folder.
+To quantize the model, follow the instructions in the `post_training_quantization` folder.
@@ -0,0 +1,48 @@
+
+# Requirements
+In order to run the training and quantization scripts, you would need to:
+1) Install the [sooftware conformer model](https://github.com/sooftware/conformer) as a pip package.
+2) Download the [LibriSpeech dataset from torchaudio](https://docs.pytorch.org/audio/stable/generated/torchaudio.datasets.LIBRISPEECH.html) and tokenizer from the `training` folder.
+3) For the post-training quantization, you need to install ExecuTorch from source. We recommend you to install ExecuTorch from a python 3.10 virtual environmental variable.
+Clone the [ExecuTorch repository](https://github.com/pytorch/executorch/), checkout the `release/1.0` branch and run `./install_executorch.sh` from the root folder. You also need to install the
+Ethos-U backend dependencies within ExecuTorch, you can do that by running `./examples/arm/setup.sh --i-agree-to-the-contained-eula`.
+You can find detailed instructions about installing ExecuTorch from source [in the official documentation](https://docs.pytorch.org/executorch/stable/using-executorch-building-from-source.html#install-executorch-pip-package-from-source). The
+detailed instructions for setting up the Arm backend are in the [examples/arm folder](https://github.com/pytorch/executorch/tree/main/examples/arm#example-workflow). The key commands are:
+```
+$ git clone [email protected]:pytorch/executorch.git
+$ git checkout git release/1.0
+$ git submodule sync && git submodule update --init --recursive
+$ ./install_executorch.sh
+$ ./examples/arm/setup.sh --i-agree-to-the-contained-eula
+```
+
+## Torchcodec
+We use `torchaudio` for the pre-processing of the LibriSpeech dataset. Since [August 2025](https://github.com/pytorch/audio/commit/93f582ca5001132bfcdb115f476b73ae60e6ef8a), torchaudio requires torchcodec.
+You need to install the correct version of the `torchcodec` in order to be able to load audio samples with torchaudio. When you install ExecuTorch from the release/1.0 branch, you will get torchaudio 2.8.0.dev20250906 :
+```
+$ pip freeze | grep torch
+torch==2.9.0.dev20250906
+torchaudio==2.8.0.dev20250906
+torchvision==0.24.0.dev20250906
+....
+```
+Manually install the torchcodec package corresponding to the minor version of torchaudio. In this example, you need to install minor version dev20250906 of torchcodec.
+```
+$ pip install --pre --no-deps --index-url https://download.pytorch.org/whl/nightly/cpu \
+  "torchcodec==0.7.0.dev20250906"
+```
+As per the [torchcodec documentation](https://github.com/pytorch/torchcodec?tab=readme-ov-file#installing-torchcodec), you need to ensure you have a version of `ffmpeg` smaller than 8.
+On a Mac OS, you also need to export the `DYLD_FALLBACK_LIBRARY_PATH` environment variable to the location of the torchcodec binaries.
+```
+export DYLD_FALLBACK_LIBRARY_PATH="/opt/homebrew/opt/ffmpeg@7/lib:/opt/homebrew/lib"
+```
+
+You can now use the latest torchaudio and load audio recordings with torchcodec.
+
+# Quantization 
+
+The `ptq_evaluate_conformer_10M.py` script provides a way to quantize a Conformer speech recognition network, evaluate its accuracy on the LibriSpeech dataset and generate an ExecuTorch pte for the Ethos-U NPU.
+We assume you have obtained a trained checkpoint from the Training section. Run the `ptq_evaluate_10M_model.py` script to obtain a pte file that will be deployed on device in the following way:
+ `$ python ptq_evaluate_conformer_10M.py --root <path to the LibriSpeech dataset> --dataset <dataset, usually test-clean> --checkpoint <path to checkpoint with trained weights> --sp-model <path to the tokenizer>`
+
+We obtain ~8% Word Error Rate when evaluating the quantized model on the test-clean dataset.