Run specific code only once (which generates randomized values) before starting DDP #9134
-
Hi. I have a function which generates a set of random values (hyperparameters) which are then used to create my model. I want to run this function only once, then use it to create my model and then start ddp training on this model. However, with the current setup, when I start ddp, the randomize function gets called again, so now I have 2 GPU processes, each having initialized the model with different set of hyperparameters. (random values from both calls aren't same) If I add |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 8 replies
-
If that isn't to costly to generate, I'd recommend to generate them at every process and then use ddp broadcasting to overwrite the values with the ones from the main process (src=0) |
Beta Was this translation helpful? Give feedback.
-
Hey @Gateway2745, Here is an example where I am broadcasting the current checkpoint tmpdir to all processes: https://github.com/PyTorchLightning/pytorch-lightning/blob/522df2b89b35c050b14bb5e9c2ba2c3d1d20ea67/tests/core/test_metric_result_integration.py#L468 Best, |
Beta Was this translation helpful? Give feedback.
-
Hello @tchaton. Thank you for the helpful example. I have a quick question. Since I would need to broadcast my hyperparameters (present in a python dictionary) before I even create my model, where would be the best place to do this? I guess I would need to do it before |
Beta Was this translation helpful? Give feedback.
-
Hey @Gateway2745, You could do this from pytorch_lightning.utilities.cli import LightningCLI
from unittest import mock
import optuna
config_path = ...
class MyModel(LightningModule):
def __init__(self, num_layers):
...
def objective(trial):
num_layers = trial.suggest_uniform('num_layers', 10, 100)
with mock.patch("sys.argv", ["any.py", "--config", str(config_path), "--trainer.accelerator", "ddp_spawn", "--trainer.gpu", "2", "--model.num_layers", str(num_layers)]):
cli = LightningCLI(MyModel, MyDataModule)
return cli.trainer.model_checkpoint.best_score
study = optuna.create_study()
study.optimize(objective, n_trials=100)
study.best_params |
Beta Was this translation helpful? Give feedback.
Hey @Gateway2745,
You could do this