Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability to perform multiple runs, possibly with a parameter sweep #35

Open
jopetty opened this issue Nov 24, 2020 · 3 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request v2 Version 2 (with Hydra)
Milestone

Comments

@jopetty
Copy link
Member

jopetty commented Nov 24, 2020

Currently, there is a one-to-one correspondence between issuing the python main.py ... command and producing a single model. But ideally, an experiment should encapsulate both a parameter sweep (i.e., train SRN-SRN, GRU-GRU, Transformer-Transformer, etc. models) and allow for multiple runs of any given parameter combination (i.e., for each set of hyperparameters do 10 runs so we can average performance). This means we need a way to launch multiple jobs and specify parameter sweeps in a config file.

For sweeping: Hydra includes an Ax plugin, which seems to allow for parameter sweeps to be defined in the YAML config.

For multiruns: Maybe look at the JobLib launcher plugin

@jopetty jopetty added documentation Improvements or additions to documentation enhancement New feature or request v2 Version 2 (with Hydra) labels Nov 24, 2020
@jopetty jopetty self-assigned this Nov 24, 2020
@jopetty
Copy link
Member Author

jopetty commented Nov 24, 2020

Also of note, there is a plugin for the Submitit Launcher which automatically runs sessions with SLURM jobs. It would be useful to see if this could be used to submit jobs on the GRACE cluster.

@jopetty jopetty added this to the Stable 1.0 milestone Jan 4, 2021
@jopetty
Copy link
Member Author

jopetty commented Jan 9, 2021

It looks like multirun support is somewhat provided out-of-the-box; using the -m flag allows one to sweep over parameters like:

python train.py model=model1,model2

but this has some problems with the custom output directory structure we've created.

@jopetty
Copy link
Member Author

jopetty commented Feb 2, 2021

Multirun directory structure has been fixed in 33f9ce5. Now it looks like:

outputs/
  experiment/
    model/
      DATE_TIME/
        RUN_1/
        RUN_2/
        ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request v2 Version 2 (with Hydra)
Projects
None yet
Development

No branches or pull requests

1 participant