Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

[feature] Add Region feature configs and extraction file #3

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,13 +52,50 @@ The final model by default should be saved under `./output` of your current work
We also release the configuration (`configs/R-50-updn.yaml`) for training the region features described in **bottom-up-attention** paper, which is a faithful re-implementation of the original [one](https://github.com/peteanderson80/bottom-up-attention) in Detectron2.

## Feature Extraction

### Grid Features

Grid feature extraction can be done by simply running once the model is trained (or you can directly download our pre-trained models, see below):
```bash
python extract_grid_feature.py -config-file configs/R-50-grid.yaml --dataset <dataset>
```
and the code will load the final model from `cfg.OUTPUT_DIR` (which one can override in command line) and start extracting features for `<dataset>`, we provide three options for the dataset: `coco_2014_train`, `coco_2014_val` and `coco_2015_test`, they correspond to `train`, `val` and `test` splits of the VQA dataset. The extracted features can be conveniently loaded in [Pythia](https://github.com/facebookresearch/pythia).
and the code will load the final model from `cfg.OUTPUT_DIR` (which one can override in command line) and start extracting features for `<dataset>`, we provide three options for the dataset: `coco_2014_train`, `coco_2014_val` and `coco_2015_test`, they correspond to `train`, `val` and `test` splits of the VQA dataset. The extracted features can be conveniently loaded in [MMF](https://github.com/facebookresearch/mmf).

To extract features on your customized dataset, you may want to dump the image information into [COCO](http://cocodataset.org/) `.json` format, and add the dataset information to use `extract_grid_feature.py`, or you can hack `extract_grid_feature.py` and directly loop over images.

To extract features on your customized dataset, you may want to dump the image information into [COCO](http://cocodataset.org/) `.json` format, and add the dataset information to use `extract_grid_feature.py`, or you can hack `extract_grid_feature.py` and directly loop over images.
### Region Features

For extracting region features use the `extract_region_feature.py` script, run:

```bash
python extract_region_feature.py --config-file configs/X-152-region-c4.yaml --dataset <dataset>
```

The code will load the final model from `cfg.OUTPUT_DIR`. You can also specify a path to the images folder of your dataset directly, by running:

```bash
python extract_region_feature.py \
--config-file configs/X-152-region-c4.yaml \
--dataset <dataset_name> \
--dataset-path <path_to_dataset_images_dir>
```

The features are saved in `.npy` format which is a dictionary containing these fields:

```json
{
"bbox": ,
"num_boxes": ,
"objects": ,
"image_height": ,
"image_width": ,
"cls_prob": ,
"features": ,
}

```

`bbox` contains all the extracted bounding boxes, `cls_prob` contains the class probabilities of `objects` present in the bounding boxes, `features` contain the extracted features of each bounding box.

## Pre-Trained Models and Features
We release several pre-trained models for grid features: one with R-50 backbone, one with X-101, one with X-152, and one with additional improvements used for the 2020 VQA Challenge (see `X-152-challenge.yaml`). The models can be used directly to extract features. For your convenience, we also release the pre-extracted features for direct download.
Expand All @@ -70,6 +107,14 @@ We release several pre-trained models for grid features: one with R-50 backbone,
| X-152 | 4.7 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/X-152-features.tgz">features</a> |
| X-152++ | 3.7 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/X-152pp.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/X-152pp-features.tgz">features</a> |

We release pretrained models for region features: X-152 with C4, X-152 with DC5, X-152 with FPN:

| Backbone | AP<sub>50:95</sub> | Download |
| -------- | ---- | -------- |
| X-152-region-FPN | 5.25 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/fpn-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152-features_fc7.tar.gz">features (fc7)</a> &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152-features.tar.gz">features (fc6)</a> |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: maybe just showing one digit after decimal point is enough here.

| X-152-region-DC5 | 5.60 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/region-dc5-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/dc5-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/region-dc5-X-152-features_fc7.tar.gz">features (fc7)</a> |
| X-152-region-C4 | 5.67 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/region-c4-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/c4-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/region-c4-X-152-features.tar.gz">features</a> |

## License

The code is released under the [Apache 2.0 license](LICENSE).
2 changes: 1 addition & 1 deletion configs/Base-RCNN-grid.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ MODEL:
IN_FEATURES: ["res5"]
NUM_CLASSES: 1600
ROI_BOX_HEAD:
NAME: "FastRCNNConvFCHead"
NAME: "AttributeFastRCNNConvFCHead"
NUM_FC: 2
POOLER_RESOLUTION: 1
POOLER_SAMPLING_RATIO: 2
Expand Down
31 changes: 31 additions & 0 deletions configs/Base-RCNN-region-c4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
MODEL:
vedanuj marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency we can either use "region" for all the files, or "updn" in all the files.

META_ARCHITECTURE: "GeneralizedRCNN"
ATTRIBUTE_ON: True
RPN:
PRE_NMS_TOPK_TEST: 6000
POST_NMS_TOPK_TEST: 1000
SMOOTH_L1_BETA: 0.1111
vedanuj marked this conversation as resolved.
Show resolved Hide resolved
BOUNDARY_THRESH: 0
ROI_HEADS:
NAME: "AttributeRes5ROIHeads"
NUM_CLASSES: 1600
ROI_BOX_HEAD:
NAME: "FastRCNNConvFCHead"
NUM_FC: 2
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 2
SMOOTH_L1_BETA: 1.
DATASETS:
TRAIN: ("visual_genome_train", "visual_genome_val")
TEST: ("visual_genome_test",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.02
STEPS: (60000, 80000)
MAX_ITER: 90000
INPUT:
MIN_SIZE_TRAIN: (600,)
MAX_SIZE_TRAIN: 1000
MIN_SIZE_TEST: 600
MAX_SIZE_TEST: 1000
VERSION: 2
37 changes: 37 additions & 0 deletions configs/Base-RCNN-region-dc5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
ATTRIBUTE_ON: True
RESNETS:
OUT_FEATURES: ["res5"]
RES5_DILATION: 2
RPN:
IN_FEATURES: ["res5"]
PRE_NMS_TOPK_TEST: 6000
POST_NMS_TOPK_TEST: 1000
SMOOTH_L1_BETA: 0.1111
vedanuj marked this conversation as resolved.
Show resolved Hide resolved
BOUNDARY_THRESH: 0
ROI_HEADS:
NAME: "AttributeStandardROIHeads"
IN_FEATURES: ["res5"]
NUM_CLASSES: 1600
ROI_BOX_HEAD:
NAME: "AttributeFastRCNNConvFCHead"
NUM_FC: 2
FC_DIM: 2048
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 2
SMOOTH_L1_BETA: 1.
DATASETS:
TRAIN: ("visual_genome_train", "visual_genome_val")
TEST: ("visual_genome_test",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.02
STEPS: (60000, 80000)
MAX_ITER: 90000
INPUT:
MIN_SIZE_TRAIN: (600,)
MAX_SIZE_TRAIN: 1000
MIN_SIZE_TEST: 600
MAX_SIZE_TEST: 1000
VERSION: 2
45 changes: 45 additions & 0 deletions configs/Base-RCNN-region-fpn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
BACKBONE:
NAME: "build_resnet_fpn_backbone"
ATTRIBUTE_ON: True
RESNETS:
OUT_FEATURES: ["res2", "res3", "res4", "res5"]
FPN:
IN_FEATURES: ["res2", "res3", "res4", "res5"]
ANCHOR_GENERATOR:
SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
RPN:
IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level
PRE_NMS_TOPK_TEST: 1000
POST_NMS_TOPK_TRAIN: 2000
POST_NMS_TOPK_TEST: 1000
SMOOTH_L1_BETA: 0.1111
vedanuj marked this conversation as resolved.
Show resolved Hide resolved
BOUNDARY_THRESH: 0
ROI_HEADS:
NAME: "AttributeStandardROIHeads"
IN_FEATURES: ["p2", "p3", "p4", "p5"]
NUM_CLASSES: 1600
ROI_BOX_HEAD:
NAME: "AttributeFastRCNNConvFCHead"
NUM_FC: 2
FC_DIM: 2048
POOLER_RESOLUTION: 7
POOLER_SAMPLING_RATIO: 2
SMOOTH_L1_BETA: 1.
DATASETS:
TRAIN: ("visual_genome_train", "visual_genome_val")
TEST: ("visual_genome_test",)
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.02
STEPS: (60000, 80000)
MAX_ITER: 90000
INPUT:
MIN_SIZE_TRAIN: (600,)
MAX_SIZE_TRAIN: 1000
MIN_SIZE_TEST: 600
MAX_SIZE_TEST: 1000
VERSION: 2
5 changes: 0 additions & 5 deletions configs/R-50-updn.yaml

This file was deleted.

8 changes: 8 additions & 0 deletions configs/X-152-region-c4.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
_BASE_: "Base-RCNN-region-c4.yaml"
MODEL:
WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
RESNETS:
STRIDE_IN_1X1: False # this is a C2 model
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
DEPTH: 152
8 changes: 8 additions & 0 deletions configs/X-152-region-dc5.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
_BASE_: "Base-RCNN-region-dc5.yaml"
MODEL:
WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
RESNETS:
STRIDE_IN_1X1: False # this is a C2 model
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
DEPTH: 152
8 changes: 8 additions & 0 deletions configs/X-152-region-fpn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
_BASE_: "Base-RCNN-region-fpn.yaml"
MODEL:
WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
RESNETS:
STRIDE_IN_1X1: False # this is a C2 model
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
DEPTH: 152
Loading