Skip to content

Latest commit

 

History

History
60 lines (35 loc) · 2.57 KB

README.md

File metadata and controls

60 lines (35 loc) · 2.57 KB

Swin-MSTP

This is the code of the paper: Swin-MSTP: Swin Transformer with Multi-Scale Temporal Perception for Continuous Sign Language Recognition

The code will be added soon ...

Proposed Swin-MSTP Framework

GradCam Visualization

hippo

Prerequisites

  • This project is implemented in Pytorch (>1.8). Thus please install Pytorch first.

  • ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.

  • sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
    ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite

Data Preparation

  1. Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.

  2. Modify the dataset_preprocess.py file to reflect the location of the dataset.

  3. The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.

    cd ./preprocess
    python data_preprocess.py --process-image --multiprocessing

Training

To train the Swin_MSTP model on phoenix14, run the command below:

python main.py

The Swin-Small architeture is used by default. If you would like to train your model using the Swin-Tiny strcuture, in the baseline.yaml file, change the c2d_type to 'swin_t'.

Inference

To evaluate the trained model, run the command below:

python main.py --load-weights {name}.pt --phase test

Results

Model WER on Dev WER on Test Pretrained model
Swin-MSTPtiny 18.9 19.0 [GoogleDrive]
Swin-MSTPsmall 18.1 18.7 [GoogleDrive]

The framewoek has also been trained on Phoenix2014-T, CSL, and CSL-Daily. The models are avaibale upon request.

Acknowledgment

The code is based on VAC. We thank them for their work!