facebookresearch · vedanuj · Jul 5, 2020 · Jul 5, 2020 · Jul 18, 2020 · endernewton
diff --git a/README.md b/README.md
@@ -52,13 +52,50 @@ The final model by default should be saved under `./output` of your current work
 We also release the configuration (`configs/R-50-updn.yaml`) for training the region features described in **bottom-up-attention** paper, which is a faithful re-implementation of the original [one](https://github.com/peteanderson80/bottom-up-attention) in Detectron2.
 
 ## Feature Extraction
+
+### Grid Features
+
 Grid feature extraction can be done by simply running once the model is trained (or you can directly download our pre-trained models, see below):
 ```bash
 python extract_grid_feature.py -config-file configs/R-50-grid.yaml --dataset <dataset>
 ```
-and the code will load the final model from `cfg.OUTPUT_DIR` (which one can override in command line) and start extracting features for `<dataset>`, we provide three options for the dataset: `coco_2014_train`, `coco_2014_val` and `coco_2015_test`, they correspond to `train`, `val` and `test` splits of the VQA dataset. The extracted features can be conveniently loaded in [Pythia](https://github.com/facebookresearch/pythia).
+and the code will load the final model from `cfg.OUTPUT_DIR` (which one can override in command line) and start extracting features for `<dataset>`, we provide three options for the dataset: `coco_2014_train`, `coco_2014_val` and `coco_2015_test`, they correspond to `train`, `val` and `test` splits of the VQA dataset. The extracted features can be conveniently loaded in [MMF](https://github.com/facebookresearch/mmf).
+
+To extract features on your customized dataset, you may want to dump the image information into [COCO](http://cocodataset.org/) `.json` format, and add the dataset information to use `extract_grid_feature.py`, or you can hack `extract_grid_feature.py` and directly loop over images.
 
-To extract features on your customized dataset, you may want to dump the image information into [COCO](http://cocodataset.org/) `.json` format, and add the dataset information to use `extract_grid_feature.py`, or you can hack `extract_grid_feature.py` and directly loop over images. 
+### Region Features
+
+For extracting region features use the `extract_region_feature.py` script, run:
+
+```bash
+python extract_region_feature.py --config-file configs/X-152-region-c4.yaml --dataset <dataset>
+```
+
+The code will load the final model from `cfg.OUTPUT_DIR`. You can also specify a path to the images folder of your dataset directly, by running:
+
+```bash
+python extract_region_feature.py \
+  --config-file configs/X-152-region-c4.yaml \
+  --dataset <dataset_name> \
+  --dataset-path <path_to_dataset_images_dir>
+```
+
+The features are saved in `.npy` format which is a dictionary containing these fields:
+
+```json
+{
+  "bbox": ,
+  "num_boxes": ,
+  "objects": ,
+  "image_height": ,
+  "image_width": ,
+  "cls_prob": ,
+  "features": ,
+}
+
+```
+
+`bbox` contains all the extracted bounding boxes, `cls_prob` contains the class probabilities of `objects` present in the bounding boxes, `features` contain the extracted features of each bounding box.
 
 ## Pre-Trained Models and Features
 We release several pre-trained models for grid features: one with R-50 backbone, one with X-101, one with X-152, and one with additional improvements used for the 2020 VQA Challenge (see `X-152-challenge.yaml`). The models can be used directly to extract features. For your convenience, we also release the pre-extracted features for direct download.
@@ -70,6 +107,14 @@ We release several pre-trained models for grid features: one with R-50 backbone,
 | X-152    | 4.7 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152/X-152-features.tgz">features</a> |
 | X-152++  | 3.7 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/X-152pp.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/X-152pp/X-152pp-features.tgz">features</a> |
 
+We release pretrained models for region features: X-152 with C4, X-152 with DC5, X-152 with FPN:
+
+| Backbone | AP<sub>50:95</sub> | Download |
+| -------- | ---- | -------- |
+| X-152-region-FPN    | 5.25 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/fpn-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152-features_fc7.tar.gz">features (fc7)</a> &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-fpn-X-152/region-fpn-X-152-features.tar.gz">features (fc6)</a> |
+| X-152-region-DC5    | 5.60 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/region-dc5-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/dc5-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-dc5-X-152/region-dc5-X-152-features_fc7.tar.gz">features (fc7)</a> |
+| X-152-region-C4  | 5.67 | <a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/region-c4-X-152.pth">model</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/c4-X-152-metrics.json">metrics</a>&nbsp;\| &nbsp;<a href="https://dl.fbaipublicfiles.com/grid-feats-vqa/region-c4-X-152/region-c4-X-152-features.tar.gz">features</a> |
+
 ## License
 
 The code is released under the [Apache 2.0 license](LICENSE).
diff --git a/configs/Base-RCNN-grid.yaml b/configs/Base-RCNN-grid.yaml
@@ -15,7 +15,7 @@ MODEL:
     IN_FEATURES: ["res5"]
     NUM_CLASSES: 1600
   ROI_BOX_HEAD:
-    NAME: "FastRCNNConvFCHead"
+    NAME: "AttributeFastRCNNConvFCHead"
     NUM_FC: 2
     POOLER_RESOLUTION: 1
     POOLER_SAMPLING_RATIO: 2

diff --git a/configs/Base-RCNN-region-c4.yaml b/configs/Base-RCNN-region-c4.yaml
@@ -0,0 +1,31 @@
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  ATTRIBUTE_ON: True
+  RPN:
+    PRE_NMS_TOPK_TEST: 6000
+    POST_NMS_TOPK_TEST: 1000
+    SMOOTH_L1_BETA: 0.1111
+    BOUNDARY_THRESH: 0
+  ROI_HEADS:
+    NAME: "AttributeRes5ROIHeads"
+    NUM_CLASSES: 1600
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_FC: 2
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 2
+    SMOOTH_L1_BETA: 1.
+DATASETS:
+  TRAIN: ("visual_genome_train", "visual_genome_val")
+  TEST: ("visual_genome_test",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (600,)
+  MAX_SIZE_TRAIN: 1000
+  MIN_SIZE_TEST: 600
+  MAX_SIZE_TEST: 1000
+VERSION: 2
diff --git a/configs/Base-RCNN-region-dc5.yaml b/configs/Base-RCNN-region-dc5.yaml
@@ -0,0 +1,37 @@
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  ATTRIBUTE_ON: True
+  RESNETS:
+    OUT_FEATURES: ["res5"]
+    RES5_DILATION: 2
+  RPN:
+    IN_FEATURES: ["res5"]
+    PRE_NMS_TOPK_TEST: 6000
+    POST_NMS_TOPK_TEST: 1000
+    SMOOTH_L1_BETA: 0.1111
+    BOUNDARY_THRESH: 0
+  ROI_HEADS:
+    NAME: "AttributeStandardROIHeads"
+    IN_FEATURES: ["res5"]
+    NUM_CLASSES: 1600
+  ROI_BOX_HEAD:
+    NAME: "AttributeFastRCNNConvFCHead"
+    NUM_FC: 2
+    FC_DIM: 2048
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 2
+    SMOOTH_L1_BETA: 1.
+DATASETS:
+  TRAIN: ("visual_genome_train", "visual_genome_val")
+  TEST: ("visual_genome_test",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (600,)
+  MAX_SIZE_TRAIN: 1000
+  MIN_SIZE_TEST: 600
+  MAX_SIZE_TEST: 1000
+VERSION: 2
diff --git a/configs/Base-RCNN-region-fpn.yaml b/configs/Base-RCNN-region-fpn.yaml
@@ -0,0 +1,45 @@
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    NAME: "build_resnet_fpn_backbone"
+  ATTRIBUTE_ON: True
+  RESNETS:
+    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
+  FPN:
+    IN_FEATURES: ["res2", "res3", "res4", "res5"]
+  ANCHOR_GENERATOR:
+    SIZES: [[32], [64], [128], [256], [512]]  # One size for each in feature map
+    ASPECT_RATIOS: [[0.5, 1.0, 2.0]]  # Three aspect ratios (same for all in feature maps)
+  RPN:
+    IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
+    PRE_NMS_TOPK_TRAIN: 2000  # Per FPN level
+    PRE_NMS_TOPK_TEST: 1000
+    POST_NMS_TOPK_TRAIN: 2000
+    POST_NMS_TOPK_TEST: 1000
+    SMOOTH_L1_BETA: 0.1111
+    BOUNDARY_THRESH: 0
+  ROI_HEADS:
+    NAME: "AttributeStandardROIHeads"
+    IN_FEATURES: ["p2", "p3", "p4", "p5"]
+    NUM_CLASSES: 1600
+  ROI_BOX_HEAD:
+    NAME: "AttributeFastRCNNConvFCHead"
+    NUM_FC: 2
+    FC_DIM: 2048
+    POOLER_RESOLUTION: 7
+    POOLER_SAMPLING_RATIO: 2
+    SMOOTH_L1_BETA: 1.
+DATASETS:
+  TRAIN: ("visual_genome_train", "visual_genome_val")
+  TEST: ("visual_genome_test",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (600,)
+  MAX_SIZE_TRAIN: 1000
+  MIN_SIZE_TEST: 600
+  MAX_SIZE_TEST: 1000
+VERSION: 2
diff --git a/configs/R-50-updn.yaml b/configs/R-50-updn.yaml
diff --git a/configs/X-152-region-c4.yaml b/configs/X-152-region-c4.yaml
@@ -0,0 +1,8 @@
+_BASE_: "Base-RCNN-region-c4.yaml"
+MODEL:
+  WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
+  RESNETS:
+    STRIDE_IN_1X1: False  # this is a C2 model
+    NUM_GROUPS: 32
+    WIDTH_PER_GROUP: 8
+    DEPTH: 152
diff --git a/configs/X-152-region-dc5.yaml b/configs/X-152-region-dc5.yaml
@@ -0,0 +1,8 @@
+_BASE_: "Base-RCNN-region-dc5.yaml"
+MODEL:
+  WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
+  RESNETS:
+    STRIDE_IN_1X1: False  # this is a C2 model
+    NUM_GROUPS: 32
+    WIDTH_PER_GROUP: 8
+    DEPTH: 152
diff --git a/configs/X-152-region-fpn.yaml b/configs/X-152-region-fpn.yaml
@@ -0,0 +1,8 @@
+_BASE_: "Base-RCNN-region-fpn.yaml"
+MODEL:
+  WEIGHTS: "catalog://ImageNetPretrained/FAIR/X-152-32x8d-IN5k"
+  RESNETS:
+    STRIDE_IN_1X1: False  # this is a C2 model
+    NUM_GROUPS: 32
+    WIDTH_PER_GROUP: 8
+    DEPTH: 152