HST-SLR: Hierarchical Sub-action Tree with Large Language Models for Continuous Sign Language Recognition
This is the official implementation of the paper by Dejie Yang, Zhu Xu, Xinjie Gao and Yang Liu
Continuous sign language recognition (CSLR) aims to transcribe untrimmed videos into glosses, which are typically textual words. Recent studies indicate that the lack of large datasets and precise annotations has become a bottleneck for CSLR due to insufficient training data. To address this, some works have developed cross-modal solutions to align visual and textual modalities. However, they typically extract textual features from glosses without fully utilizing their knowledge. In this paper, we propose the Hierarchical Sub-action Tree (HST), termed HST-CSLR, to efficiently combine gloss knowledge with visual representation learning. By incorporating gloss-specific knowledge from large language models, our approach leverages textual information more effectively. Specifically, we construct an HST for textual information representation, aligning visual and textual modalities step-by-step and benefiting from the tree structure to reduce computational complexity. Additionally, we impose a contrastive alignment enhancement to bridge the gap between the two modalities. Experiments on four datasets (PHOENIX-2014, PHOENIX-2014T, CSL-Daily, and Sign Language Gesture) demonstrate the effectiveness of our HST-CSLR.
- Create a
Conda
environment.
conda create -n HST python=3.7 -y && conda activate HST
- Install PyTorch with Conda
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
You can download PyTorch for your own CUDA version by yourself, but the version is better >=1.13 to be compatible with ctcdecode or these may exist errors.
-
Install ctcdecode ctcdecode==0.4 [parlance/ctcdecode],for beam search decode. (ctcdecode is only supported on the Linux platform.)
-
Install other requirements
pip install -r requirements.txt
You can choose any one of following datasets to verify the effectiveness of HST-SLR.
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
-
Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess.py --process-image --multiprocessing
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T
-
Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python T_process.py
python dataset_preprocess-T.py --process-image --multiprocessing
-
Request the CSL-Daily Dataset from this website [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily
-
Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing
Choose the dataset from phoenix2014/phoenix2014T/CSLDaily
and run the folling command to generate descriptions for each word. You should modify the api_key in description_generate.py
(line 10). If you want to generate the descriptions yourself, please make sure to remove the file description_{target_dataset}.txt
in the directory first.
cd ./hst_build/generation
python description_generate.py --dataset target_dataset
Choose the dataset from phoenix2014/phoenix2014T/CSLDaily
and run the following command to build HST with the descriptions obtained before.
cd ./hst_build
python cluster.py --dataset target_dataset
Choose the dataset from phoenix2014/phoenix2014T/CSLDaily
and run the following command to set the prototype for each tree node, generate the update matrix for updating and find the tree node that contains a certain word.
python prototype_set.py --dataset target_dataset
python update_matrix.py --dataset target_dataset
python search_matrix.py --dataset target_dataset
We have provided all the generated results mentioned earlier. Please download the zip file through the link Google Drive. Then put the file in ./HDT_prototype
and unzip it.
Dev WER | Test WER | Pretrained model |
---|---|---|
17.9% | 18.2% | [Google Drive] |
To evaluate the pretrained model on PHOENIX2014, run the command below:
python main.py --config ./configs/baseline_14.yaml --device your_device --work-dir ./work_dir/your_expname/ --load-weights path_to_weight.pt --phase test
Dev WER | Test WER | Pretrained model |
---|---|---|
17.4% | 19.1% | [Google Drive] |
To evaluate the pretrained model on PHOENIX2014-T, run the command below:
python main.py --config ./configs/baseline_14T.yaml --device your_device --work-dir ./work_dir/your_expname/ --load-weights path_to_weight.pt --phase test
Dev WER | Test WER | Pretrained model |
---|---|---|
27.5% | 27.4% | [Google Drive] |
To evaluate the pretrained model on CSL-Daily, run the command below:
python main.py --config ./configs/baseline_CD.yaml --device your_device --work-dir ./work_dir/your_expname/ --load-weights path_to_weight.pt --phase test
To train the SLR model on PHOENIX2014, run the command below:
python main.py --config ./configs/baseline_14.yaml --device your_device --work-dir ./work_dir/your_expname/
To train the SLR model on PHOENIX2014-T, run the command below:
python main.py --config ./configs/baseline_14T.yaml --device your_device --work-dir ./work_dir/your_expname/
To train the SLR model on CSL-Daily, run the command below:
python main.py --config ./configs/baseline_CD.yaml --device your_device --work-dir ./work_dir/your_expname/
We also conduct experiments on the Sign Language Gesture dataset. First, download the dataset through the link Google Drive and the pretrained weights through the link Google Drive. Then put the files in ./SLG
and unzip them. Then run the command below:
python train.py --device your_device
@inproceedings{gan2024signgraph,
title={HST-SLR: Hierarchical Sub-action Tree with Large Language Models for Continuous Sign Language Recognition},
author={Dejie Yang, Zhu Xu, Xinjie Gao and Yang Liu},
booktitle={IEEE International Conference on Multimedia & Expo(ICME)},
year={2025}
}