Swin-MSTP

This is the code of the paper: Swin-MSTP: Swin Transformer with Multi-Scale Temporal Perception for Continuous Sign Language Recognition

The code will be added soon ...

Proposed Swin-MSTP Framework

GradCam Visualization

Prerequisites

This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode]，for beam search decode.
sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite

Data Preparation

Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
Modify the dataset_preprocess.py file to reflect the location of the dataset.
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
```
cd ./preprocess
python data_preprocess.py --process-image --multiprocessing
```

Training

To train the Swin_MSTP model on phoenix14, run the command below:

python main.py

The Swin-Small architeture is used by default. If you would like to train your model using the Swin-Tiny strcuture, in the baseline.yaml file, change the c2d_type to 'swin_t'.

Inference

To evaluate the trained model, run the command below：

python main.py --load-weights {name}.pt --phase test

Results

Model	WER on Dev	WER on Test	Pretrained model
Swin-MSTP_tiny	18.9	19.0	[GoogleDrive]
Swin-MSTP_small	18.1	18.7	[GoogleDrive]

The framewoek has also been trained on Phoenix2014-T, CSL, and CSL-Daily. The models are avaibale upon request.

Acknowledgment

The code is based on VAC. We thank them for their work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Swin-MSTP

Proposed Swin-MSTP Framework

GradCam Visualization

Prerequisites

Data Preparation

Training

Inference

Results

Acknowledgment

Files

README.md

Latest commit

History

README.md

File metadata and controls

Swin-MSTP

Proposed Swin-MSTP Framework

GradCam Visualization

Prerequisites

Data Preparation

Training

Inference

Results

Acknowledgment