-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
PyTorchcomponent: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug
Description
Describe the bug
I tried to use the max_run
parameter of sagemaker.pytorch.estimator.PyTorch
to define the max run time in seconds, but it doesnt work. See the attached screenshot for an example. In the screenshot, I set max_run
to be 603 seconds. But it didnt stop at 603, evidenced by the training time at 841s (at which I manually terminated the run)
To reproduce
Just set max_run
of sagemaker.pytorch.estimator.PyTorch
to be any integer value
Expected behavior
I expect the sagemaker training run to terminate when it has elapsed the seconds set in max_run
Screenshots or logs
See screenshot in description
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.207.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
- Framework version: 2.2.0
- Python version: 3.10.1
- CPU or GPU: CPU locally, and GPU instance on Sagemaker
- Custom Docker image (Y/N): N
Additional context
NA
sarseniy
Metadata
Metadata
Assignees
Labels
PyTorchcomponent: trainingRelates to the SageMaker Training PlatformRelates to the SageMaker Training Platformtype: bug