-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi
Sorry to interrupt. I tried to run this repo on sagemaker with code below:
estimator = Estimator(role=role,
train_instance_count=1,
train_instance_type=instance_type,
image_name=ecr_image
)
estimator.fit({'training':'s3_path','checkpoint':'s3_path/faster_rcnn_inception_v2_coco_2018_01_28/'})
And error:
Exception during training: Return Code: 1, CMD: ['/usr/local/bin/python', '/opt/ml/code/tensorflow-models/research/object_detection/model_main.py', '--model_dir', '/opt/ml/model', '--pipeline_config_path', '/opt/ml/input/data/training/pipeline.config', '--num_train_steps', '100']
Traceback (most recent call last):
File "/opt/ml/code/train", line 83, in
commandline_util.run_python_script(training_script, default_params)
File "/opt/ml/code/utils/commandline_util.py", line 34, in run_python_script
run(script_cmd)
File "/opt/ml/code/utils/commandline_util.py", line 27, in run
raise Exception(error_msg)
Exception: Return Code: 1, CMD: ['/usr/local/bin/python', '/opt/ml/code/tensorflow-models/research/object_detection/model_main.py', '--model_dir', '/opt/ml/model', '--pipeline_config_path', '/opt/ml/input/data/training/pipeline.config', '--num_train_steps', '100']
Is there any suggestions?