Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation(ACM MM 2022)

Prerequisites

The training and testing experiments are conducted using PyTorch 1.8.1 with two GeForce RTX 2080Ti GPUs with 11GB Memory.

Python 3.6

conda create -n dbsnet python=3.6

Other minor Python modules can be installed by running

pip install -r requirements.txt

Train

Download Datasets

In the paper, we use the following three public available dataset for training. Here are some steps to prepare the data:

DAVIS-16: We use all the data in the train subset of DAVIS-16. However, please download DAVIS-17 dataset, it will automatically choose the subset of DAVIS-16 for training.
YouTubeVOS-2018: We sample the training data every 5 frames in YoutubeVOS-2018. You can sample any number of frames to train the model by modifying parameter --num_frames.
FBMS: We use all the data in the train subset of FBMS.

Prepare Optical Flow

Please following the the instruction of RAFT to prepare the optical flow.

Prepare pretrained backbond

The pre-trained backbone can be downloaded from MobileViT backbone and put it into the pretrained folder.

Train

First, train the model using the YouTubeVOS-2018, DAVIS-16 and FBMS datasets.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main.py

Second, finetune the model using the DAVIS-16 and FBMS datasets.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 main.py --finetune first_stage_weight_path

Test

Run following to generate the segmentation results.

python tool.py --checkpoint model_weight_path --tools test

About the post-processing technique DenseCRF we used in the original paper, you can find it here: DSS-CRF.

Segmentation Results

The segmentation results on DAVIS-16, FBMS, DAVSOD and MCL can be downloaded from Baidu Pan(PSW:uf21).
Evaluation Toolbox: We use the standard UVOS evaluation toolbox from DAVIS-16 and VSOD evaluation toolbox from DAVSOD benchmark.
Note: When we evaluate J_Mean and F_Mean, we need to set [predict_mask > 127] = 255 and [predict_mask <=127] = 0.

Citation

If you find DBSNet useful for your research, please consider citing the following papers:

@inproceedings{fan2022dbsnet,
  title={Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation},
  author={Jiaqing Fan, Tiankang Su, Kaihua Zhang, and Qingshan Liu},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={0--0},
  year={2022}
}

Acknowledgments

Thanks for DenseNet and ConvLSTM, which helps us to quickly implement our ideas.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation(ACM MM 2022)

Prerequisites

Train

Download Datasets

Prepare Optical Flow

Prepare pretrained backbond

Train

Test

Segmentation Results

Citation

Acknowledgments

About

Releases

Packages

sutiankang/DBSNet

Folders and files

Latest commit

History

Repository files navigation

Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation(ACM MM 2022)

Prerequisites

Train

Download Datasets

Prepare Optical Flow

Prepare pretrained backbond

Train

Test

Segmentation Results

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages