This repository contains inference and training code for Mask R-CNN like networks. Models code is designed to enable ONNX export (with custom operations) and inference on CPU via OpenVINO. Detectron and maskrcnn-benchmark models are wrapped to export their weights to ONNX and OpenVINO.
- Ubuntu 16.04
- Python 3.5.2
- PyTorch 0.4.1
- OpenVINO 2019 R1 with Python API
To install required dependencies run
$ cat requirements.txt | xargs -n 1 -L 1 pip3 install
To install the package itself run
$ pip3 install -e .
A bunch of top-performing models from Detectron and maskrcnn-benchmark projects
can be easily obtained and automatically prepared for running in PyTorch and
OpenVINO, via tools/download_pretrained_weights.py
script. By default script
requires no parameters to run, for more details on its configuration run it with
a -h
key. This script could also be considered as a sample, showing how models
are supposed to be converted to ONNX/IR.
To be able to train networks and/or get quality metrics for pre-trained ones,
one have to download MS COCO dataset. Download the
dataset (train, val and annotations) and
unpack it to ./data/coco
folder. The result structure of the folder should be:
data
└── coco
├── annotations
│ ├── instances_train2017.json
│ └── instances_val2017.json
|
└── images
├── train2017
└── val2017
To train Mask R-CNN model run
$ python3 tools/train.py
This script has a default configuration which conforms to end-to-end Mask R-CNN baseline from Detectron. To view all arguments available to configure run
$ python3 tools/train.py --help
tools/demo.py
script implements a live demo application, that runs a given
Mask R-CNN like model on a set of images or a video and shows resulting instance
segmentation mask. Both PyTorch and OpenVINO backends are supported.
As input, the demo application takes:
- a model
- a path to PyTorch
nn.Module
implementing the network of interest specified with a command line argument--model
- a path to a file with pre-trained weights specified with a command line
argument
--ckpt
- a path to PyTorch
- a source of data:
- a path to a single image file or a folder with image files specified with
a command line argument
--images
- a path to a video file or a numeric ID of a web-camera specified with
a command line argument
--video
- a path to a single image file or a folder with image files specified with
a command line argument
- preprocessing parameters
- an image resize mode and target resolution. Two options are available here.
--fix_max SCALE MAXSIZE
command line argument forces image to be resized to such a size, that its shorter and larger sides are not greater thanSCALE
andMAXSIZE
respectively, while original aspect ratio is left unchanged.--fit_window MAXHEIGHT MAXWIDTH
option enables the mode, when image height and width are made not greater thatMAXHEIGHT
andMAXWIDTH
respectively, while original aspect ratio is preserved. - a mean value subtracted from every pixel of an image
(
--mean_pixel
argument)
- an image resize mode and target resolution. Two options are available here.
- extra options controlling visualization and performance statistics collection.
Please refer to script's help message (run it with
-h
argument) for more details.
For example, assuming that tools/download_pretrained_weights.py
script with
default options has been used to fetch public pre-trained models, to run demo on
live video stream from a webcam using ResNet50-FPN Mask R-CNN model for instance
segmentation run the following command:
$ python3 tools/demo.py \
--dataset coco_2017_val \
--ckpt data/pretrained_models/converted/coco/detectron/mask_rcnn_resnet50_fpn_2x.pth \
--mean_pixel 102.9801 115.9465 122.7717 \
--fit_window 800 1333 \
--video 0 \
--delay 1 \
--show_fps \
pytorch \
--model segmentoly.rcnn.model_zoo.resnet_fpn_mask_rcnn.ResNet50FPNMaskRCNN \
--show_flops
Note: Use
CUDA_VISIBLE_DEVICES
environment variable to configure which (if any) GPUs to use during evaluation. If empty value is assigned, PyTorch uses CPU backend.
The same demo application may be used for running instance segmentation models
on CPU via OpenVINO. Almost the same set of parameters is available in this
case. The major difference is that model (--model
argument) should be defined
as a path to an XML file with OpenVINO IR description, rather that a PyTorch
class, and --ckpt
argument should point to a BIN file with OpenVINO IR
weights.
Example:
$ python3 tools/demo.py \
--dataset coco_2017_val \
--ckpt data/pretrained_models/ir/coco/detectron/mask_rcnn_resnet50_fpn_2x.bin \
--fit_window 800 1333 \
--video 0 \
--delay 1 \
--show_fps \
openvino \
--model data/pretrained_models/ir/coco/detectron/mask_rcnn_resnet50_fpn_2x.xml
Note: For most of the Detectron and maskrcnn-benchmark models mean pixel value of [102.9801, 115.9465, 122.7717] is used while running with a PyTorch backend. But at the same time this value is integrated into the model itself by OpenVINO Model Optimizer during export to IR, so there is no need to specify this value when running with an OpenVINO backend.
tools/test.py
script is designed for quality evaluation of instance
segmentation models. The script has almost the same interface as a demo script,
and supports both PyTorch and OpenVINO backends.
For example, to evaluate ResNet50-FPN Mask R-CNN model on COCO 2017 Val dataset using PyTorch backend run:
$ python3 tools/test.py \
--dataset coco_2017_val \
--ckpt data/pretrained_models/converted/coco/detectron/mask_rcnn_resnet50_fpn_2x.pth \
--mean_pixel 102.9801 115.9465 122.7717 \
--fit_max 800 1333 \
pytorch \
--model segmentoly.rcnn.model_zoo.resnet_fpn_mask_rcnn.ResNet50FPNMaskRCNN \
--show_flops
Note: Use
CUDA_VISIBLE_DEVICES
environment variable to configure which (if any) GPUs to use during evaluation. If empty value is assigned, PyTorch uses CPU backend.
For example, to evaluate ResNet50-FPN Mask R-CNN model on COCO 2017 Val dataset using OpenVINO backend run:
$ python3 tools/test.py \
--dataset coco_2017_val \
--ckpt data/pretrained_models/ir/coco/detectron/mask_rcnn_resnet50_fpn_2x.bin \
--fit_window 800 1333 \
openvino \
--model data/pretrained_models/ir/coco/detectron/mask_rcnn_resnet50_fpn_2x.xml
Note: Default quality evaluation protocol for Mask R-CNN model uses
fit_max
image resize mode at a preprocessing stage (see the note above about resize modes). Though, by default OpenVINO IR models created bytools/download_pretrained_weights.py
script are configured to work properly withfit_window
mode only. This has no difference for landscape-oriented images, but affects portrait-oriented ones. So to directly reproduce reference quality numbers one has to export PyTorch models to OpenVINO IR manually settingMAXSIZE
xMAXSIZE
input resolution. This will be fixed in later releases.
To run the model via OpenVINO one has to export PyTorch model to ONNX first and then convert it to OpenVINO Internal Representation (IR) using Model Optimizer.
tools/convert_to_onnx.py
script exports a given model to ONNX representation.
As input, the script takes:
- a model
- a path to PyTorch
nn.Module
implementing the network of interest specified with a command line argument--model
- a path to a file with pre-trained weights specified with a command line
argument
--ckpt
- a path to PyTorch
- a number of classes that network detects specified either directly using
a
-nc NUMBER_OF_CLASSES
argument or implicitly by specifying a dataset the network was trained or supposed to work on using a--dataset DATASET_NAME
argument - an output ONNX file path specified with a command line argument
--output_file
- an input resolution the network should work on specified with a command line
argument
--input_size
in aHEIGHT WIDTH
format.
For example, here is the command used inside
tools/download_pretrained_weights.py
script to export ResNet50-FPN Mask R-CNN
model to ONNX representation:
$ python3 tools/convert_to_onnx.py \
--model segmentoly.rcnn.model_zoo.resnet_fpn_mask_rcnn.ResNet50FPNMaskRCNN \
--ckpt data/pretrained_models/converted/coco/detectron/mask_rcnn_resnet50_fpn_2x.pth \
--input_size 800 1344 \
--dataset coco_2017_val \
--show_flops \
--output_file data/pretrained_models/onnx/coco/detectron/mask_rcnn_resnet50_fpn_2x.onnx
Note: Most of FPN Mask R-CNN models assumes to have an input with a size divisible by 32. So even when image resize is configured to work in
fit_max
mode withSCALE
800 andMAXSIZE
1333, maximal input resolution is actually 800x1344 to ensure divisibility.
Conversion from ONNX model representation to OpenVINO IR is straightforward and handled by OpenVINO Model Optimizer. Please refer to Model Optimizer documentation for details on how it works.
For example, here is the command used inside
tools/download_pretrained_weights.py
script to export ResNet50-FPN Mask R-CNN
model to IR given its ONNX representation:
$ mo.py \
--framework onnx \
--input_model data/pretrained_models/onnx/coco/detectron/mask_rcnn_resnet50_fpn_2x.onnx \
--output_dir data/pretrained_models/ir/coco/detectron/ \
--input "im_data,im_info" \
--output "boxes,scores,classes,batch_ids,raw_masks" \
--mean_values "im_data[102.9801,115.9465,122.7717],im_info[0,0,0]"