Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate annealing: key errors #124

Open
gogobd opened this issue Jul 9, 2020 · 3 comments
Open

Learning rate annealing: key errors #124

gogobd opened this issue Jul 9, 2020 · 3 comments

Comments

@gogobd
Copy link

gogobd commented Jul 9, 2020

I was trying to anneal / cool off the learning rate following the examples to build models from scratch but when I try

# python train.py --hps=small_vqvae,small_upsampler,all_fp16,cpu_ema --name=small_upsampler --sample_length=262144 --bs=4 --audio_files_dir=[………] --labels=False --train --test --aug_shift --aug_blend --restore_vqvae=logs/small_vqvae/checkpoint_latest.pth.tar --prior --levels=2 --level=0 --weight_decay=0.01 --save_iters=1000 --restore_prior=logs/small_prior/checkpoint_latest.pth.tar --lr_use_linear_decay --lr_start_linear_decay=547 --lr_decay=18

but I get

RuntimeError: Error(s) in loading state_dict for SimplePrior:
Missing key(s) in state_dict: "conditioner_blocks.0.x_emb.weight", "conditioner_blocks.0.cond.model.0.weight", "conditioner_blocks.0.cond.model.0.bias", "conditioner_blocks.0.cond.model.1.0.blocks.0.model.1.weight", [………] , "conditioner_blocks.0.cond.model.3.1.weight", "c
onditioner_blocks.0.cond.model.3.1.bias", "conditioner_blocks.0.ln.weight", "conditio
ner_blocks.0.ln.bias".
> /opt/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py(830)load_stat
e_dict()

I don't know which other file I would have to restore (I'm using 'logs/small_prior/checkpoint_latest.pth.tar') - maybe someone can help me with this?

@ObscuraDK
Copy link

ObscuraDK commented Jul 12, 2020

I have been using these parameters. I don't know if they are right, but I have started to get a pretty decent output.

mpiexec -n 3 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_prior --sample_length=1048576 --bs=4 --aug_shift --aug_blend --audio_files_dir=/home/vertigo/jukebox/learning2 --labels=False --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --restore_prior=/home/vertigo/jukebox/logs/pretrained_vqvae_small_prior/checkpoint_latest.pth.tar --lr_use_linear_decay --lr_start_linear_decay=0 --lr_decay=0.9

@btrude
Copy link

btrude commented Jul 12, 2020

--lr_decay=0.9

@ObscuraDK This means you are only invoking the lr decay for the last fraction of a step during training (ie, this is essentially doing nothing unless you are working with an absolutely massive dataset). 1 step = 1/x iteration during training. I can't speak to what the effective number of steps is given I don't know what you are training on, but I am finding that with small datasets the lr is probably too high by default and the decay maybe should persist for the entire duration of training (but I have yet to test this). You might find this link helpful to better understand whats going on here: https://www.jeremyjordan.me/nn-learning-rate/

@leonardog27
Copy link

we are making prior level 2 training using Colab. We have a group on discord called
What kind of special set up we need to do to dataset for lyrics and non lyrics training?
https://discord.com/invite/6At7WwM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants