Pipeline issues to keep GPUs utilised when number of instance masks are 20+ per image

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [-] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [x] I checked to make sure that this issue has not already been filed.

## 1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/...

## 2. Describe the bug

32xCPUs seem not enough to fill 6x K80 GPUs for some scenarios while in other scenarios no problem. In a good case all 6xGPUs show nearly 100% utilization. Bad case, GPUs are alternating utilisation of 100% among each other. However, CPU utilisation across 32x cores is ~50% only.


## 3. Steps to reproduce

I am using the legacy due to the multi-GPU support for TF1: 
`object_detection/legacy/train.py --num_workers=6 --ps_tasks=1`

I use the 'mask_rcnn_inception_v2_coco.config' and pretrained model.
batch_size = 12 (2 per GPU)

My images and masks are all same width x height. No resizing needed. Everything is stored in 10x TFRecord Shards.

I have trained successful models. 

However, the main difference I can see between the scenarios if the number of instance masks is particular high. 

In a bad case, more than 20 up to 60 instance masks per image sample. That seems to be the reason when GPU utilisation drops 4x.

No other augmentations or resizing needed. All pre-calculated and stored across 10x TFRecord shards. 


## 4. Expected behavior

Expected to have 100% utilisation of GPUs. Did not expect the CPU to be the bottleneck.

## 5. Additional context

Did set in .config
```
model {
     image_resizer {
        identity_resizer {
      }
}
train_input_reader {
    batch_queue_capacity: 256
    num_batch_queue_threads: 32 
    prefetch_queue_capacity: 256
}
```

No effect at all.

## 6. System information

- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 LTS
- Mobile device name if the issue happens on a mobile device:
- TensorFlow installed from (source or binary): binary 
- TensorFlow version (use command below): 1.15.3
- Python version: 3.6
- Bazel version (if compiling from source): na
- GCC/Compiler version (if compiling from source): na
- CUDA/cuDNN version: 10.0
- GPU model and memory: 6 x K80 11GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline issues to keep GPUs utilised when number of instance masks are 20+ per image #9463

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pipeline issues to keep GPUs utilised when number of instance masks are 20+ per image #9463

Description

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions