Description
Hi, thanks for the great library. I'm trying to use keras-tuner for the first time.
My current computing environment is quite unstable so in my terminal, I let the python command that runs keras-tuner search to be re-executed once the process is terminated. Then I got a Tensorboard results like below.
There are only two trials, and the second one seems to be stopped and re-run again and again. (Btw I'm passing a fixed seed during instantiating the tuner.)
From the implementations of Oracle.create_trial()
, BaseTuner.search()
, and BaseTuner.on_trial_begin(trial))
, and this behavior I'm seeing, I think keras tuner does not resume if a training was stopped in the middle of a trial.
Am I correct? And what would be a fix? To implement resuming-from-the-latest-checkpoint, seems like, I have to make quite a lot of changes (inheriting the tuner classes and override several methods, etc) which would be only a temporary solution. But I'm not sure if there's any other solution.