-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
I'm facing an issue with HyperDriveStepRun. Have been using HyperDriveStepRun to retrieve the best model in a pipeline. However it's recently been throwing a Service invocation timed out error. Somewhat puzzled as the same pipeline was working in previous runs. The last I ran the pipeline was in Mar 2021 so was expecting it to work. Would you have any idea what the error means?
The pipeline:
Step 3, hyperdrive_save_best_model.py
The error is thrown in Step 3 at line
step_run = HyperDriveStepRun(step_run=pipeline_run.find_step_run('model_training_with_hyperdrive')[0])

I'm able to find the step in the pipeline by running pipeline_run.find_step_run('model_training_with_hyperdrive')[0].get_details() which returns the output below
{'runId': '81afda1d-e09a-4b66-9916-a70d31fe5aaf', 'status': 'Completed', 'startTimeUtc': '2021-04-26T10:04:55.388077Z', 'endTimeUtc': '2021-04-26T12:02:13.912549Z', 'properties': {'ContentSnapshotId': 'f60b2327-8e65-4245-ae9d-ad1d15052db9', 'StepType': 'HyperDriveStep', 'ComputeTargetType': 'HyperDrive', 'azureml.moduleid': '1997d65f-e755-4b48-b7b6-802cebe0e932', 'azureml.runsource': 'azureml.StepRun', 'azureml.nodeid': '1b54a211', 'azureml.pipelinerunid': '4305e2b4-5695-4736-b081-6710b3e7cb64'}, 'inputDatasets': [], 'logFiles': {'logs/azureml/executionlogs.txt': 'https://cerespalmml0992777240.blob.core.windows.net/azureml/ExperimentRun/dcid.81afda1d-e09a-4b66-9916-a70d31fe5aaf/logs/azureml/executionlogs.txt?sv=2019-02-02&sr=b&sig=fQHeUuUTSKkh7RXUMDIr59sqvwXhzWVPBNXp4vaQ%2F%2F4%3D&st=2021-04-26T11%3A57%3A26Z&se=2021-04-26T20%3A07%3A26Z&sp=r', 'logs/azureml/stderrlogs.txt': 'https://cerespalmml0992777240.blob.core.windows.net/azureml/ExperimentRun/dcid.81afda1d-e09a-4b66-9916-a70d31fe5aaf/logs/azureml/stderrlogs.txt?sv=2019-02-02&sr=b&sig=Re57T4Nx1MERInO8xXTGQ7gY%2BFmxfpQKklTPSbV40ig%3D&st=2021-04-26T11%3A57%3A26Z&se=2021-04-26T20%3A07%3A26Z&sp=r', 'logs/azureml/stdoutlogs.txt': 'https://cerespalmml0992777240.blob.core.windows.net/azureml/ExperimentRun/dcid.81afda1d-e09a-4b66-9916-a70d31fe5aaf/logs/azureml/stdoutlogs.txt?sv=2019-02-02&sr=b&sig=cQ5eMV8tG%2BsmJDAd9AYvtNkvVssq3YdcP4CGIsQn6MA%3D&st=2021-04-26T11%3A57%3A26Z&se=2021-04-26T20%3A07%3A26Z&sp=r'}}
Originally posted by @julianaddison in #269 (comment)

