Skip to content

Do trials resume automatically? #439

Open
@keunwoochoi

Description

@keunwoochoi

Hi, thanks for the great library. I'm trying to use keras-tuner for the first time.

My current computing environment is quite unstable so in my terminal, I let the python command that runs keras-tuner search to be re-executed once the process is terminated. Then I got a Tensorboard results like below.

image

There are only two trials, and the second one seems to be stopped and re-run again and again. (Btw I'm passing a fixed seed during instantiating the tuner.)

From the implementations of Oracle.create_trial(), BaseTuner.search(), and BaseTuner.on_trial_begin(trial)), and this behavior I'm seeing, I think keras tuner does not resume if a training was stopped in the middle of a trial.
Am I correct? And what would be a fix? To implement resuming-from-the-latest-checkpoint, seems like, I have to make quite a lot of changes (inheriting the tuner classes and override several methods, etc) which would be only a temporary solution. But I'm not sure if there's any other solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions