ATModel can perform 5 tasks simultaneously, namely Panoptic Segmentation, Captioning, VQA, Depth Estimation and OCR. Below are the details of the datasets for different tasks.
Panoptic Segmentation -> ADE20K
Captioning -> VizWiz_Cap
VQA -> VizWiz_VQA
Depth Estimation -> NYU_v2
OCR -> MJSynth(MJ), SynthText (ST), ICDAR_2013(IC13), ICDAR_2015 (IC15), IIIT5K-Words (IIIT5K), Street View Text (SVT), Street ViewText-Perspective (SVTP), CUTE80(CUTE)
Expected dataset structure for ADE20K
.atmodel_data/
└── seg_datatsets/ADEChallengeData2016/
└── images/
└── training/
├── ADE_train_00000001.jpg
├── ...
└── validation/
├── ADE_val_00000001.jpg
├── ...
└── ade20k_panoptic_train/
├── ADE_train_00000001.png
├── ...
└── ade20k_panoptic_val/
├── ADE_val_00000001.png
├── ...
├── ade20k_panoptic_train.json
├── ade20k_panoptic_val.json
Expected dataset structure for VizWiz_Cap
.atmodel_data/
└── captioning_datasets/vizwiz/
└── train/
├── VizWiz_train_00000000.jpg
├── ...
└── val/
├── VizWiz_val_00000000.jpg
├── ...
└── test/
├── VizWiz_test_00000000.jpg
├── ...
└── annotations/
├── train.json
├── val.json
├── test.json
Expected dataset structure for VizWiz_VQA
.atmodel_data/
└── vqa_datasets/vizwiz/
└── train/
├── VizWiz_train_00000000.jpg
├── ...
└── val/
├── VizWiz_val_00000000.jpg
├── ...
└── test/
├── VizWiz_test_00000000.jpg
├── ...
└── annotations/
├── train.json
├── val.json
├── test.json
Please follow the BTS to download the NYU_v2 dataset and prepare the dataset as below.
.atmodel_data/
└── depth_datasets/nyuv2/
└── images/
├── 0.jpg
├── ...
└── raw_depths/
├── 0.png
├── ...
├── train.txt
├── val.txt
Please follow the ABINet to download the OCR datasets and prepare the datasets as below.
.atmodel_data/
└── ocr_datasets
└── training/
└── MJ/
└── MJ_train/
└── MJ_valid/
└── MJ_test/
└── ST/
└── evaluation/
└── IIIT5k_3000/
└── SVT/
└── SVTP/
└── IC13_857/
└── IC15_1811/
└── CUTE80/
└── data/
├── charset_vn.txt