Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [N] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
- [Y] I am reporting the issue to the correct repository. (Model Garden official or research directory)
- [Y] I checked to make sure that this issue has not already been filed.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py
2. Describe the bug
I want to train the model Mask R-CNN Inception ResNet V2 1024x1024, I have my dataset coverted to .record file, the pipeline model is configured, and the GPU works with other training models. I tried to limit the GPU memory (also works in other training models) but the error still appears.
Error:
2020-10-06 12:10:44.322216: E tensorflow/stream_executor/cuda/cuda_driver.cc:825] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-10-06 12:10:44.322569: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 17179869184
3. Steps to reproduce
#from ~/models/research
python object_detection/model_main_tf2.py --pipeline_config_path=/home/robotronics/Projects/blm_Mask_RCNN/model_MaskRCNN/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8/model.config --model_dir=/home/robotronics/Projects/blm_Mask_RCNN/blm/models/model --num_train_steps=5000 --sample_1_of_n_eval_examples=10 --alsologstostderr
4. Expected behavior
Complete training model
5. Additional context
I try to limit the memory in the model_main_tf2.py and model_lib_v2.py
import tensorflow as tfl
gpus = tfl.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
tfl.config.experimental.set_virtual_device_configuration(gpus[0],[tfl.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
logical_gpus = tfl.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
I did the examples of the documentation https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/auto_examples/plot_object_detection_checkpoint.html and also work.
6. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
- Mobile device name if the issue happens on a mobile device:
- TensorFlow installed from (source or binary): 2.2.0
- TensorFlow version (use command below): 2.2.0
- Python version: 3.6.9
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: 10.1/7.6.5
- GPU model and memory: GeForce GTX 1080