This is the code of the paper: Swin-MSTP: Swin Transformer with Multi-Scale Temporal Perception for Continuous Sign Language Recognition
The code will be added soon ...
-
This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
-
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.
-
sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
Modify the dataset_preprocess.py file to reflect the location of the dataset.
-
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python data_preprocess.py --process-image --multiprocessing
To train the Swin_MSTP model on phoenix14, run the command below:
python main.py
The Swin-Small architeture is used by default. If you would like to train your model using the Swin-Tiny strcuture, in the baseline.yaml file, change the c2d_type to 'swin_t'.
To evaluate the trained model, run the command below:
python main.py --load-weights {name}.pt --phase test
Model | WER on Dev | WER on Test | Pretrained model |
---|---|---|---|
Swin-MSTPtiny | 18.9 | 19.0 | [GoogleDrive] |
Swin-MSTPsmall | 18.1 | 18.7 | [GoogleDrive] |
The framewoek has also been trained on Phoenix2014-T, CSL, and CSL-Daily. The models are avaibale upon request.
The code is based on VAC. We thank them for their work!