Preparing Dataset

ATModel can perform 5 tasks simultaneously, namely Panoptic Segmentation, Captioning, VQA, Depth Estimation and OCR. Below are the details of the datasets for different tasks.

Panoptic Segmentation -> ADE20K
Captioning -> VizWiz_Cap
VQA -> VizWiz_VQA
Depth Estimation -> NYU_v2
OCR -> MJSynth(MJ), SynthText (ST), ICDAR_2013(IC13), ICDAR_2015 (IC15), IIIT5K-Words (IIIT5K), Street View Text (SVT), Street ViewText-Perspective (SVTP), CUTE80(CUTE)

Panoptic Segmentation

Expected dataset structure for ADE20K

.atmodel_data/
└── seg_datatsets/ADEChallengeData2016/
    └── images/
        └── training/
           ├── ADE_train_00000001.jpg
           ├── ...
        └── validation/
           ├── ADE_val_00000001.jpg 
           ├── ...
    └── ade20k_panoptic_train/
        ├── ADE_train_00000001.png 
        ├── ...
    └── ade20k_panoptic_val/
        ├── ADE_val_00000001.png 
        ├── ...
    ├── ade20k_panoptic_train.json
    ├── ade20k_panoptic_val.json

Captioning

Expected dataset structure for VizWiz_Cap

.atmodel_data/
└── captioning_datasets/vizwiz/
    └── train/
        ├── VizWiz_train_00000000.jpg
        ├── ...
    └── val/
        ├── VizWiz_val_00000000.jpg
        ├── ...
    └── test/
        ├── VizWiz_test_00000000.jpg
        ├── ...
    └── annotations/
       ├── train.json
       ├── val.json
       ├── test.json

VQA

Expected dataset structure for VizWiz_VQA

.atmodel_data/
└── vqa_datasets/vizwiz/
    └── train/
        ├── VizWiz_train_00000000.jpg
        ├── ...
    └── val/
        ├── VizWiz_val_00000000.jpg
        ├── ...
    └── test/
        ├── VizWiz_test_00000000.jpg
        ├── ...
    └── annotations/
       ├── train.json
       ├── val.json
       ├── test.json

Depth Estimation

Please follow the BTS to download the NYU_v2 dataset and prepare the dataset as below.

.atmodel_data/
└── depth_datasets/nyuv2/
    └── images/
        ├── 0.jpg
        ├── ...
    └── raw_depths/
        ├── 0.png
        ├── ...
    ├── train.txt
    ├── val.txt

OCR

Please follow the ABINet to download the OCR datasets and prepare the datasets as below.

.atmodel_data/
└── ocr_datasets
    └── training/
        └── MJ/
            └── MJ_train/
            └── MJ_valid/
            └── MJ_test/
        └── ST/
    └── evaluation/
        └── IIIT5k_3000/
        └── SVT/
        └── SVTP/
        └── IC13_857/
        └── IC15_1811/
        └── CUTE80/
    └── data/
        ├── charset_vn.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATASET.md

DATASET.md

Preparing Dataset

Panoptic Segmentation

Captioning

VQA

Depth Estimation

OCR

Files

DATASET.md

Latest commit

History

DATASET.md

File metadata and controls

Preparing Dataset

Panoptic Segmentation

Captioning

VQA

Depth Estimation

OCR