This repository contains code to reproduce the results for our paper Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis (CVPRW 2024 AICity Track2)
Table of Contents:
- Clone the repo
git clone --recursive https://github.com/UCF-SST-Lab/AICity-2024-Track2-CVPRW
- Create vitual environment by conda
conda create -n PDVC python=3.7
source activate PDVC
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
conda install ffmpeg
pip install -r requirement.txt
- Compile the deformable attention layer (requires GCC >= 5.4).
cd pdvc/ops
sh make.sh
The CLIP features (Training/Test) extracted from BDD and WTS can be downloaded via Google Drive
# Training
config_path=cfgs/bdd_veh_clip_pdvcl.yml
python train.py --cfg_path ${config_path} --gpu_id ${GPU_ID} --epoch=30
# The script will evaluate the model given specified evaluation epochs. The results and logs are saved in `./save`.
# Evaluation
eval_folder=bdd_eval # specify the folder to be evaluated
python eval.py --eval_folder ${eval_folder} --eval_transformer_input_type queries --gpu_id ${GPU_ID}
bash run.sh
Notes: In bash file, --load=save/XXX has to be updated with the folder containing obtained models.
python formatting_submission.py
Model | Features | Data | BLEU4 | METEOR | ROUGE-L | CIDEr | S2 | config_path |
---|---|---|---|---|---|---|---|---|
PDVC_light | CLIP | BDD | 0.2102 | 0.4435 | 0.4705 | 0.8698 | 30.2821 | cfgs/bdd_xxx_clip_pdvcl.yml |
PDVC_light | CLIP | WTS | 0.2005 | 0.4115 | 0.4416 | 0.5573 | 27.7347 | cfgs/train_wts_xxx_xxx_pdvcl_finetune.yml |
If you find this repo helpful, please consider citing:
@article{shoman2024enhancing,
title={Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis},
author={Shoman, Maged and Wang, Dongdong and Aboah, Armstrong and Abdel-Aty, Mohamed},
journal={arXiv preprint arXiv:2404.08229},
year={2024}
}
@article{wang20248th,
title={The 8th AI City Challenge},
author={Wang, Shuo and Anastasiu, David C and Tang, Zheng and Chang, Ming-Ching and Yao, Yue and Zheng, Liang and Rahman, Mohammed Shaiqur and Arya, Meenakshi S and Sharma, Anuj and Chakraborty, Pranamesh and others},
journal={arXiv preprint arXiv:2404.09432},
year={2024}
}
The implementation of PDVC is modified based on PDVC.
The implementation of video feature extraction is modified based on FrozenBiLM.
The implementation of Deformable Transformer is mainly based on Deformable DETR.
The implementation of the captioning head is based on ImageCaptioning.pytorch.
We thanks the authors for their efforts.