-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning rate annealing: key errors #124
Comments
I have been using these parameters. I don't know if they are right, but I have started to get a pretty decent output. mpiexec -n 3 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_prior --sample_length=1048576 --bs=4 --aug_shift --aug_blend --audio_files_dir=/home/vertigo/jukebox/learning2 --labels=False --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --restore_prior=/home/vertigo/jukebox/logs/pretrained_vqvae_small_prior/checkpoint_latest.pth.tar --lr_use_linear_decay --lr_start_linear_decay=0 --lr_decay=0.9 |
@ObscuraDK This means you are only invoking the lr decay for the last fraction of a step during training (ie, this is essentially doing nothing unless you are working with an absolutely massive dataset). 1 step = 1/x iteration during training. I can't speak to what the effective number of steps is given I don't know what you are training on, but I am finding that with small datasets the lr is probably too high by default and the decay maybe should persist for the entire duration of training (but I have yet to test this). You might find this link helpful to better understand whats going on here: https://www.jeremyjordan.me/nn-learning-rate/ |
we are making prior level 2 training using Colab. We have a group on discord called |
I was trying to anneal / cool off the learning rate following the examples to build models from scratch but when I try
but I get
I don't know which other file I would have to restore (I'm using 'logs/small_prior/checkpoint_latest.pth.tar') - maybe someone can help me with this?
The text was updated successfully, but these errors were encountered: