GitHub - wonjun-dev/Google-American-Fingerspelling-Recognition: 33rd place solution - Google American Fingerspelling Recognition, Kaggle

TL;DR

We used a conformer-like model consisting transformer encoder, 1d convolution with CBAM and bi-LSTM. Overall model size is about 16MB after INT8 quantization. Training objective is CTC with InterCTC loss.

Data preprocess

Used landmarks
- 1 for nose, 21 for dominant hand, 40 for lips
- x, y coordinates
Normalization
- Standardized distance from nose coordinates
Feature engineering
- Concatenation of normalized location, difference of next frame, joint distance of hand
- Total 582 dims
Removing data that input frame is shorter than 2 times of target phrase

Data augmentation

Horizontal flip landmarks
Interpolation
Affine transfrom

Model

2 stacked encoder with Transformer, 1D convolution with CBAM and Bi-LSTM
- hidden dim: 352
CTC loss and Inter CTC loss after first encoder
17M parameters and INT8 quantization

Train

2 staged training (300 + 200 epochs)
- Use supplemental and train data in first 300 epochs
- Use train data 200 epochs
Ranger optimizer
Cosine decay scheduler with 12 epochs warmup
AWP
- It prevents the validation loss from diverging, but it doesn't seem to improve the edit distance.
- Used to train long epochs.

Not worked

Augmentation
- Time, spatial, landmark masking
- Time reverse
Autoregressive decoder
- Use joint loss, CTC loss for encoder and Crossentropy for decoder
- Inference time is longer and the number of parameters larger than those of the CTC encoder alone, but the performance improvement is not clear, so it is not used.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
t_enc		t_enc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_util.py		data_util.py
learner_util.py		learner_util.py
loss_util.py		loss_util.py
make_pid_data.py		make_pid_data.py
make_tfrecord.py		make_tfrecord.py
model_util.py		model_util.py
train_wj.sh		train_wj.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TL;DR

Data preprocess

Data augmentation

Model

Train

Not worked

About

Releases

Packages

Languages

License

wonjun-dev/Google-American-Fingerspelling-Recognition

Folders and files

Latest commit

History

Repository files navigation

TL;DR

Data preprocess

Data augmentation

Model

Train

Not worked

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages