Add YOLO and tiny YOLO v2 models w/ sparsities.

Add descriptions for all three models. Add model link. Update yolo-tiny-v2-ava-0001.md Update yolo-tiny-v2-ava-0001.md Update yolo-tiny-v2-ava-0001.md Update yolo-tiny-v2-ava-sparse-30-0001.md Update yolo-tiny-v2-ava-sparse-30-0001.md Update yolo-tiny-v2-ava-sparse-30-0001.md Update yolo-tiny-v2-ava-sparse-60-0001.md Update yolo-tiny-v2-ava-0001.md Update yolo-tiny-v2-ava-0001.md Add sparse YOLO v2 models. Update models/intel/yolo-tiny-v2-ava-0001/description/yolo-tiny-v2-ava-0001.md Co-Authored-By: Roman Donchenko <[email protected]> add FP16 configs fix models readme names
FliiteCorp · Mar 3, 2020 · b0370bb · b0370bb
1 parent d9ec041
commit b0370bb
Show file tree

Hide file tree

Showing 13 changed files with 439 additions and 5 deletions.
diff --git a/models/intel/index.md b/models/intel/index.md
@@ -40,6 +40,12 @@ network to detect objects of the same type better.
 | [vehicle-license-plate-detection-barrier-0106](./vehicle-license-plate-detection-barrier-0106/description/vehicle-license-plate-detection-barrier-0106.md)                          | 0.349                | 0.634      |       |         | X        |       | X              |         |
 | [product-detection-0001](./product-detection-0001/description/product-detection-0001.md)                                                                                            | 3.598                | 3.212      |       |         |          |       |                | X       |
 | [person-detection-asl-0001](./person-detection-asl-0001/description/person-detection-asl-0001.md)                                                                                   | 0.986                | 1.338      |       |      X  |          |       |                |         |
+| [yolo-v2-ava-0001](./yolo-v2-ava-0001/description/yolo-v2-ava-0001.md)                                                                                                              | 29.38                | 48.29      |       |      X  | X        | X     |                |         |
+| [yolo-v2-ava-sparse-35-0001](./yolo-v2-ava-sparse-35-0001/description/yolo-v2-ava-sparse-35-0001.md)                                                                                | 29.38                | 48.29      |       |      X  | X        | X     |                |         |
+| [yolo-v2-ava-sparse-70-0001](./yolo-v2-ava-sparse-70-0001/description/yolo-v2-ava-sparse-70-0001.md)                                                                                | 29.38                | 48.29      |       |      X  | X        | X     |                |         |
+| [yolo-tiny-v2-ava-0001](./yolo-tiny-v2-ava-0001/description/yolo-tiny-v2-ava-0001.md)                                                                                               | 6.975                | 15.12      |       |      X  | X        | X     |                |         |
+| [yolo-tiny-v2-ava-sparse-30-0001](./yolo-tiny-v2-ava-sparse-30-0001/description/yolo-tiny-v2-ava-sparse-30-0001.md)                                                                 | 6.975                | 15.12      |       |      X  | X        | X     |                |         |
+| [yolo-tiny-v2-ava-sparse-60-0001](./yolo-tiny-v2-ava-sparse-60-0001/description/yolo-tiny-v2-ava-sparse-60-0001.md)                                                                 | 6.975                | 15.12      |       |      X  | X        | X     |                |         |
 
 
 ## Object Recognition Models

diff --git a/models/intel/yolo-tiny-v2-ava-0001/description/yolo-tiny-v2-ava-0001.md b/models/intel/yolo-tiny-v2-ava-0001/description/yolo-tiny-v2-ava-0001.md
@@ -2,7 +2,7 @@
 
 ## Use Case and High-Level Description
 
-This is a re-implemented and re-trained version of tiny YOLO v2 object detection network trained with VOC2012 training dataset.
+This is a re-implemented and re-trained version of [tiny YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
 
 ## Example
 
@@ -11,12 +11,11 @@ This is a re-implemented and re-trained version of tiny YOLO v2 object detection
 | Metric                          | Value                                     |
 |---------------------------------|-------------------------------------------|
 | Mean Average Precision (mAP)    | 35.37%                                    |
-| Flops                           | 6.97Bn                                    |
-| Source framework                | Tensorflow*                               |
+| Flops                           | 6.97Bn*                                   |
+| Source framework                | TensorFlow**                              |
 
 Average Precision metric described in: Mark Everingham et al.
 ["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
-
 Tested on VOC 2012 validation dataset.
 
 ## Performance
@@ -43,4 +42,6 @@ Tested on VOC 2012 validation dataset.
     - `y_loc` and `x_loc`: spatial location of each grid.
 
 ## Legal Information
-[*] Other names and brands may be claimed as the property of others.
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/.../yolo-tiny-v2-ava-sparse-30-0001/description/yolo-tiny-v2-ava-sparse-30-0001.md b/.../yolo-tiny-v2-ava-sparse-30-0001/description/yolo-tiny-v2-ava-sparse-30-0001.md
@@ -0,0 +1,48 @@
+# yolo-tiny-v2-ava-sparse-30-0001
+
+## Use Case and High-Level Description
+
+This is a re-implemented and re-trained version of [tiny YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
+[Network weight pruning](https://arxiv.org/abs/1710.01878) is applied to sparsify convolution layers (30% of network parameters are set to zeros).
+
+## Example
+
+## Specification
+
+| Metric                          | Value                                     |
+|---------------------------------|-------------------------------------------|
+| Mean Average Precision (mAP)    | 36.33%                                    |
+| Flops                           | 6.97Bn*                                   |
+| Source framework                | Tensorflow**                              |
+
+Average Precision metric described in: Mark Everingham et al.
+["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
+Tested on VOC 2012 validation dataset.
+
+## Performance
+
+## Inputs
+
+1. name: "input" , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
+  where:
+    - B - batch size
+    - C - number of channels
+    - H - image height
+    - W - image width.
+  Expected color order is BGR.
+
+## Outputs
+
+1. The net outputs a blob with the shape: [1, 21125], which can be reshaped to [5, 25, 13, 13],
+   where each number corresponds to [`num_anchors`, `cls_reg_obj_params`, `y_loc`, `x_loc`], respectively.
+    - `num_anchors`: number of anchor boxes, each spatial location specified by `y_loc` and `x_loc` has 5 anchors.
+    - `cls_reg_obj_params`: parameters for classification and regression. The values are made up of the followings:
+      * Regression parameters (4)
+      * Objectness score (1)
+      * Class score (20)
+    - `y_loc` and `x_loc`: spatial location of each grid.
+
+## Legal Information
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/.../yolo-tiny-v2-ava-sparse-60-0001/description/yolo-tiny-v2-ava-sparse-60-0001.md b/.../yolo-tiny-v2-ava-sparse-60-0001/description/yolo-tiny-v2-ava-sparse-60-0001.md
@@ -0,0 +1,49 @@
+# yolo-tiny-v2-ava-sparse-60-0001
+
+## Use Case and High-Level Description
+
+This is a re-implemented and re-trained version of [tiny YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
+[Network weight pruning](https://arxiv.org/abs/1710.01878) is applied to sparsify convolution layers (60% of network parameters are set to zeros).
+
+## Example
+
+## Specification
+
+| Metric                          | Value                                     |
+|---------------------------------|-------------------------------------------|
+| Mean Average Precision (mAP)    | 35.32%                                    |
+| Flops                           | 6.97Bn*                                   |
+| Source framework                | Tensorflow**                              |
+
+Average Precision metric described in: Mark Everingham et al.
+["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
+
+Tested on VOC 2012 validation dataset.
+
+## Performance
+
+## Inputs
+
+1. name: "input" , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
+  where:
+    - B - batch size
+    - C - number of channels
+    - H - image height
+    - W - image width.
+  Expected color order is BGR.
+
+## Outputs
+
+1. The net outputs a blob with the shape: [1, 21125], which can be reshaped to [5, 25, 13, 13],
+   where each number corresponds to [`num_anchors`, `cls_reg_obj_params`, `y_loc`, `x_loc`], respectively.
+    - `num_anchors`: number of anchor boxes, each spatial location specified by `y_loc` and `x_loc` has 5 anchors.
+    - `cls_reg_obj_params`: parameters for classification and regression. The values are made up of the followings:
+      * Regression parameters (4)
+      * Objectness score (1)
+      * Class score (20)
+    - `y_loc` and `x_loc`: spatial location of each grid.
+
+## Legal Information
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/models/intel/yolo-v2-ava-0001/description/yolo-v2-ava-0001.md b/models/intel/yolo-v2-ava-0001/description/yolo-v2-ava-0001.md
@@ -0,0 +1,47 @@
+# yolo-v2-ava-0001
+
+## Use Case and High-Level Description
+
+This is a re-implemented and re-trained version of [YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
+
+## Example
+
+## Specification
+
+| Metric                          | Value                                     |
+|---------------------------------|-------------------------------------------|
+| Mean Average Precision (mAP)    | 63.9%                                     |
+| Flops                           | 48.29Bn*                                  |
+| Source framework                | Tensorflow**                              |
+
+Average Precision metric described in: Mark Everingham et al.
+["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
+Tested on VOC 2012 validation dataset.
+
+## Performance
+
+## Inputs
+
+1. name: "input" , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
+  where:
+    - B - batch size
+    - C - number of channels
+    - H - image height
+    - W - image width.
+  Expected color order is BGR.
+
+## Outputs
+
+1. The net outputs a blob with the shape: [1, 21125], which can be reshaped to [5, 25, 13, 13],
+   where each number corresponds to [`num_anchors`, `cls_reg_obj_params`, `y_loc`, `x_loc`], respectively.
+    - `num_anchors`: number of anchor boxes, each spatial location specified by `y_loc` and `x_loc` has 5 anchors.
+    - `cls_reg_obj_params`: parameters for classification and regression. The values are made up of the followings:
+      * Regression parameters (4)
+      * Objectness score (1)
+      * Class score (20)
+    - `y_loc` and `x_loc`: spatial location of each grid.
+
+## Legal Information
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/models/intel/yolo-v2-ava-sparse-35-0001/description/yolo-v2-ava-sparse-35-0001.md b/models/intel/yolo-v2-ava-sparse-35-0001/description/yolo-v2-ava-sparse-35-0001.md
@@ -0,0 +1,48 @@
+# yolo-v2-ava-sparse-35-0001
+
+## Use Case and High-Level Description
+
+This is a re-implemented and re-trained version of [YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
+[Network weight pruning](https://arxiv.org/abs/1710.01878) is applied to sparsify convolution layers (35% of network parameters are set to zeros).
+
+## Example
+
+## Specification
+
+| Metric                       | Value        |
+|------------------------------|--------------|
+| Mean Average Precision (mAP) | 63.71%       |
+| Flops                        | 48.29Bn*     |
+| Source framework             | Tensorflow** |
+
+Average Precision metric described in: Mark Everingham et al.
+["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
+Tested on VOC 2012 validation dataset.
+
+## Performance
+
+## Inputs
+
+1. name: "input" , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
+  where:
+    - B - batch size
+    - C - number of channels
+    - H - image height
+    - W - image width.
+  Expected color order is BGR.
+
+## Outputs
+
+1. The net outputs a blob with the shape: [1, 21125], which can be reshaped to [5, 25, 13, 13],
+   where each number corresponds to [`num_anchors`, `cls_reg_obj_params`, `y_loc`, `x_loc`], respectively.
+    - `num_anchors`: number of anchor boxes, each spatial location specified by `y_loc` and `x_loc` has 5 anchors.
+    - `cls_reg_obj_params`: parameters for classification and regression. The values are made up of the followings:
+      * Regression parameters (4)
+      * Objectness score (1)
+      * Class score (20)
+    - `y_loc` and `x_loc`: spatial location of each grid.
+
+## Legal Information
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/models/intel/yolo-v2-ava-sparse-70-0001/description/yolo-v2-ava-sparse-70-0001.md b/models/intel/yolo-v2-ava-sparse-70-0001/description/yolo-v2-ava-sparse-70-0001.md
@@ -0,0 +1,49 @@
+# yolo-v2-ava-sparse-70-0001
+
+## Use Case and High-Level Description
+
+This is a re-implemented and re-trained version of [YOLO v2](https://arxiv.org/abs/1612.08242) object detection network trained with VOC2012 training dataset.
+[Network weight pruning](https://arxiv.org/abs/1710.01878) is applied to sparsify convolution layers (70% of network parameters are set to zeros).
+
+## Example
+
+## Specification
+
+| Metric                       | Value        |
+|------------------------------|--------------|
+| Mean Average Precision (mAP) | 62.9%        |
+| Flops                        | 48.29Bn*     |
+| Source framework             | Tensorflow** |
+
+Average Precision metric described in: Mark Everingham et al.
+["The PASCAL Visual Object Classes (VOC) Challenge"](http://host.robots.ox.ac.uk/pascal/VOC/pubs/everingham10.pdf).
+
+Tested on VOC 2012 validation dataset.
+
+## Performance
+
+## Inputs
+
+1. name: "input" , shape: [1x3x416x416] - An input image in the format [BxCxHxW],
+  where:
+    - B - batch size
+    - C - number of channels
+    - H - image height
+    - W - image width.
+  Expected color order is BGR.
+
+## Outputs
+
+1. The net outputs a blob with the shape: [1, 21125], which can be reshaped to [5, 25, 13, 13],
+   where each number corresponds to [`num_anchors`, `cls_reg_obj_params`, `y_loc`, `x_loc`], respectively.
+    - `num_anchors`: number of anchor boxes, each spatial location specified by `y_loc` and `x_loc` has 5 anchors.
+    - `cls_reg_obj_params`: parameters for classification and regression. The values are made up of the followings:
+      * Regression parameters (4)
+      * Objectness score (1)
+      * Class score (20)
+    - `y_loc` and `x_loc`: spatial location of each grid.
+
+## Legal Information
+[*] Same as the original implementation.
+
+[**] Other names and brands may be claimed as the property of others.
diff --git a/tools/accuracy_checker/configs/yolo-tiny-v2-ava-0001.yml b/tools/accuracy_checker/configs/yolo-tiny-v2-ava-0001.yml
@@ -10,6 +10,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16
+        model: intel/yolo-tiny-v2-ava-0001/FP16/yolo-tiny-v2-ava-0001.xml
+        weights: intel/yolo-tiny-v2-ava-0001/FP16/yolo-tiny-v2-ava-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
       - framework: dlsdk
         tags:
           - FP32-INT8
@@ -18,6 +26,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16-INT8
+        model: intel/yolo-tiny-v2-ava-0001/FP16-INT8/yolo-tiny-v2-ava-0001.xml
+        weights: intel/yolo-tiny-v2-ava-0001/FP16-INT8/yolo-tiny-v2-ava-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
     datasets:
       - name: VOC2012_without_background
 

diff --git a/tools/accuracy_checker/configs/yolo-tiny-v2-ava-sparse-30-0001.yml b/tools/accuracy_checker/configs/yolo-tiny-v2-ava-sparse-30-0001.yml
@@ -10,6 +10,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16
+        model: intel/yolo-tiny-v2-ava-sparse-30-0001/FP16/yolo-tiny-v2-ava-sparse-30-0001.xml
+        weights: intel/yolo-tiny-v2-ava-sparse-30-0001/FP16/yolo-tiny-v2-ava-sparse-30-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
       - framework: dlsdk
         tags:
           - FP32-INT8
@@ -18,6 +26,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16-INT8
+        model: intel/yolo-tiny-v2-ava-sparse-30-0001/FP16-INT8/yolo-tiny-v2-ava-sparse-30-0001.xml
+        weights: intel/yolo-tiny-v2-ava-sparse-30-0001/FP16-INT8/yolo-tiny-v2-ava-sparse-30-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
     datasets:
       - name: VOC2012_without_background
 

diff --git a/tools/accuracy_checker/configs/yolo-tiny-v2-ava-sparse-60-0001.yml b/tools/accuracy_checker/configs/yolo-tiny-v2-ava-sparse-60-0001.yml
@@ -10,6 +10,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16
+        model: intel/yolo-tiny-v2-ava-sparse-60-0001/FP16/yolo-tiny-v2-ava-sparse-60-0001.xml
+        weights: intel/yolo-tiny-v2-ava-sparse-60-0001/FP16/yolo-tiny-v2-ava-sparse-60-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
       - framework: dlsdk
         tags:
           - FP32-INT8
@@ -18,6 +26,14 @@ models:
         adapter:
           type: yolo_v2
           anchors: tiny_yolo_v2
+      - framework: dlsdk
+        tags:
+          - FP16-INT8
+        model: intel/yolo-tiny-v2-ava-sparse-60-0001/FP16-INT8/yolo-tiny-v2-ava-sparse-60-0001.xml
+        weights: intel/yolo-tiny-v2-ava-sparse-60-0001/FP16-INT8/yolo-tiny-v2-ava-sparse-60-0001.bin
+        adapter:
+          type: yolo_v2
+          anchors: tiny_yolo_v2
     datasets:
       - name: VOC2012_without_background