Error out of GPU memory during model training 

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [N] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [Y] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [Y] I checked to make sure that this issue has not already been filed.

## 1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

## 2. Describe the bug

I want to train the model Mask R-CNN Inception ResNet V2 1024x1024, I have my dataset coverted to .record file, the pipeline model is configured, and the GPU works with other training models. I tried to limit the GPU memory (also works in other training models) but the error still appears. 

Error:

```
2020-10-06 12:10:44.322216: E tensorflow/stream_executor/cuda/cuda_driver.cc:825] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-10-06 12:10:44.322569: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 17179869184
```

## 3. Steps to reproduce

#from ~/models/research
`python object_detection/model_main_tf2.py --pipeline_config_path=/home/robotronics/Projects/blm_Mask_RCNN/model_MaskRCNN/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8/model.config --model_dir=/home/robotronics/Projects/blm_Mask_RCNN/blm/models/model --num_train_steps=5000 --sample_1_of_n_eval_examples=10 --alsologstostderr`

## 4. Expected behavior

Complete training model

## 5. Additional context

I try to limit the memory in the model_main_tf2.py and model_lib_v2.py
```
import tensorflow as tfl

gpus = tfl.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
  # Currently, memory growth needs to be the same across GPUs
    tfl.config.experimental.set_virtual_device_configuration(gpus[0],[tfl.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tfl.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
  # Memory growth must be set before GPUs have been initialized
    print(e)
```
I did the examples of the documentation https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/auto_examples/plot_object_detection_checkpoint.html and also work.

## 6. System information

- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Mobile device name if the issue happens on a mobile device:
- TensorFlow installed from (source or binary): 2.2.0
- TensorFlow version (use command below): 2.2.0 
- Python version: 3.6.9
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: 10.1/7.6.5
- GPU model and memory: GeForce GTX 1080 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error out of GPU memory during model training #9345

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error out of GPU memory during model training #9345

Description

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behavior

5. Additional context

6. System information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions