-
Notifications
You must be signed in to change notification settings - Fork 310
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add qwen-audio style model training: using whisper + qwen2 (#1652)
- Loading branch information
1 parent
3b40d9b
commit 890eeec
Showing
12 changed files
with
2,324 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
|
||
# Introduction | ||
|
||
This recipe includes scripts for training [Qwen-Audio](https://github.com/QwenLM/Qwen-Audio/tree/main) style model using multiple datasets. | ||
|
||
<br> | ||
<p align="center"> | ||
<img src="assets/framework.png" width="800"/> | ||
<p> | ||
<br> | ||
|
||
[./RESULTS.md](./RESULTS.md) contains the latest results. | ||
|
||
# ASR_LLM | ||
|
||
The following table lists the folders for different tasks. | ||
|
||
| | Speech Encoder | LLM | Comment | | ||
|---------------------------------------|---------------------|--------------------|---------------------------------------------------| | ||
| [whisper_llm_zh](./whisper_llm_zh) | Whisper | Qwen2 | [Using multiple Chinese datasets](https://github.com/k2-fsa/icefall/tree/master/egs/multi_zh-hans/ASR) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
## Results | ||
|
||
### whisper_llm_zh finetuning results | ||
|
||
| Training Dataset | Speech Encoder | LLM | Projector |Comment | CER | | ||
| -------------------------| ----------------|------|--------------------------------------------------|-----|--| | ||
| Aishell1 | whisper-large-v2-aishell1-ft, freeze| Qwen2-1.5B-Instruct, LoRA | Linear, 8x downsample| [yuekai/icefall_asr_aishell_whisper_qwen2_1.5B](https://huggingface.co/yuekai/icefall_asr_aishell_whisper_qwen2_1.5B) | Aishell1 Test 3.62% | | ||
<!-- | Multi-hans-zh | whisper-large-v2-multi-hans-ft, freeze| Qwen2-1.5B-Instruct, LoRA | Linear, 8x downsample| WIP || | ||
| Multi-hans-zh | whisper-large-v2-multi-hans-ft, freeze| Qwen2-7B-Instruct, LoRA | Linear, 8x downsample| WIP || --> | ||
|
||
Command for training is: | ||
```bash | ||
pip install -r whisper_llm_zh/requirements.txt | ||
|
||
pip install huggingface_hub['cli'] | ||
mkdir -p models/whisper models/qwen | ||
|
||
# For aishell fine-tuned whisper model | ||
huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_aishell_whisper exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt | ||
# For multi-hans fine-tuned whisper model | ||
# huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_multi-hans-zh_whisper v1.1/whisper-large-v2-multi-hans-zh-epoch-3-avg-10.pt | ||
|
||
# huggingface-clie download --local-dir models/qwen Qwen/Qwen2-7B-Instruct | ||
huggingface-clie download --local-dir models/qwen Qwen/Qwen2-1.5B-Instruct | ||
|
||
torchrun --nproc_per_node 8 ./whisper_llm_zh/train.py \ | ||
--max-duration 200 \ | ||
--exp-dir ./whisper_llm_zh/exp_test \ | ||
--speech-encoder-path-or-name models/whisper/exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt \ | ||
--llm-path-or-name Qwen/Qwen2-1.5B-Instruct \ | ||
--manifest-dir data/fbank \ | ||
--deepspeed \ | ||
--deepspeed_config ./whisper_llm_zh/ds_config_zero1.json \ | ||
--use-flash-attn True \ | ||
--use-lora True --unfreeze-llm True | ||
``` | ||
|
||
Command for decoding using fine-tuned models: | ||
```bash | ||
mkdir -p models/whisper models/qwen models/checkpoint | ||
huggingface-cli download --local-dir models/checkpoint yuekai/icefall_asr_aishell_whisper_qwen2_1.5B | ||
|
||
# For aishell fine-tuned whisper model | ||
huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_aishell_whisper exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt | ||
# For multi-hans fine-tuned whisper model | ||
# huggingface-cli download --local-dir models/whisper yuekai/icefall_asr_multi-hans-zh_whisper v1.1/whisper-large-v2-multi-hans-zh-epoch-3-avg-10.pt | ||
|
||
huggingface-clie download --local-dir models/qwen Qwen/Qwen2-7B-Instruct | ||
|
||
mkdir -p whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B | ||
ln -s models/checkpoint/epoch-10-avg-5.pt whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B/epoch-999.pt | ||
|
||
python3 ./whisper_llm_zh/decode.py \ | ||
--max-duration 80 \ | ||
--exp-dir whisper_llm_zh/exp_aishell_whisper_qwen2_1.5B \ | ||
--speech-encoder-path-or-name models/whisper/exp_large_v2/whisper-large-v2-aishell1-epoch-10-avg-6.pt \ | ||
--llm-path-or-name models/qwen \ | ||
--epoch 999 --avg 1 \ | ||
--manifest-dir data/fbank \ | ||
--use-flash-attn True \ | ||
--use-lora True --dataset aishell | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
#!/usr/bin/env bash | ||
|
||
# fix segmentation fault reported in https://github.com/k2-fsa/icefall/issues/674 | ||
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python | ||
|
||
set -eou pipefail | ||
|
||
stage=0 | ||
stop_stage=0 | ||
# All files generated by this script are saved in "data". | ||
# You can safely remove "data" and rerun this script to regenerate it. | ||
mkdir -p data | ||
|
||
log() { | ||
# This function is from espnet | ||
local fname=${BASH_SOURCE[1]##*/} | ||
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*" | ||
} | ||
|
||
|
||
if [ $stage -le 0 ] && [ $stop_stage -ge 0 ]; then | ||
log "stage 0: Download whisper-large-v2 aishell 1 fbank feature from huggingface" | ||
|
||
# pip install huggingface_hub['cli'] | ||
# for aishell 1 | ||
huggingface-cli download --local-dir data yuekai/aishell_whisper_fbank_lhotse | ||
|
||
fi | ||
|
||
if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then | ||
log "stage 1: Download whisper-large-v2 multi-hans-zh fbank feature from huggingface" | ||
|
||
# for multi-hans-zh | ||
huggingface-cli download --local-dir data/fbank yuekai/wenetspeech_whisper_fbank_lhotse | ||
huggingface-cli download --local-dir data/fbank yuekai/multi_hans_zh_whisper_fbank_lhotse | ||
huggingface-cli download --local-dir data/fbank yuekai/alimeeting_aishell4_training_whisper_fbank_lhotse | ||
fi | ||
|
||
if [ $stage -le 2 ] && [ $stop_stage -ge 2 ]; then | ||
log "stage 2: Download whisper-large-v2 speechio test sets fbank feature from huggingface" | ||
|
||
# for speechio test sets | ||
mkdir data_speechio | ||
huggingface-cli download --local-dir data_speechio yuekai/icefall_asr_speechio | ||
mv data_speechio/fbank/* data/fbank | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../../multi_zh-hans/ASR/zipformer/asr_datamodule.py |
Oops, something went wrong.