-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume from the latest pickle #6
Conversation
1433c88
to
74f839d
Compare
You know I was about to question this since every resume training the augment value always starts at 0.0 |
I have enabled the option to manually set the initial augmentation strength, but I don't know if it is the right way to tackle the issue. For long training sessions on Colab, it may not be a big issue as the strength is supposed to increase quite fast. Here is a quote from the article (section 3 on page 5): For now, I will keep using the initial strength set to 0. In my latest experience, a strength of 0.5 is reached after about an hour. It took 5 hours to reach strength equal to 1! This happened with ~425k img, which matches:
It might be worth setting the initial strength for resuming. It would have to be done manually then. |
74f839d
to
b8e2c73
Compare
be38464
to
49a60d1
Compare
49a60d1
to
193e1ba
Compare
@woctezuma Did you solve the aug strength going above 1 thing? (in your tick 58 of the image you posted it was 1.005) |
Actually, I have scrapped this training run, because I had messed up the mapping net depth ( Anyway, in my subsequent training runs, I have fixed the value of mapping net depth, and I have also deactivated EMA rampup, and the augmentation strength has never been over 1 (or close to 1). For most of the training run, the augmentation strength was stable around 0.6, usually a bit higher than 0.6. After 5000 kimg, when I stopped the training run to analyze the results, the augmentation strength was at 0.736. I think the culprit was the EMA rampup, but I cannot say for sure, because I have simultaneously changed a few settings and I have not run many experiments. |
@woctezuma Thanks for your insight. I think the rampup can't be the culprit as I was using the following cfg with ramp deactivated: dict(ref_gpus=1, kimg=25000, mb=4, mbstd=4, fmaps=1, lrate=0.001, gamma=10, ema=10, ramp=None, map=8) I've done a couple more experiments and I've realized that using "bgc" instead of "bg" for the augmentation pipeline has slowed down the aug strength increasing a lot: However, from an old training, I've also observed that aug strength can still go crazy above 1 at the final stages of training, as at those stages overfitting is prone to occur. So, I think clipping the augm strength to some value below 1 (probably clipping it to the target value) can be beneficial. I think with ffhq you don't run into these problems but with more specific datasets you always run into many problems :S |
@woctezuma Just as a sidenote, do you monitor losses using tensorboard? I'm having a hard time using it with Colab :( |
You could be right. Edit: I see #27 is about that issue.
I am not good with monitoring the losses in Colab. I have deactivated the metrics during the training run, and I manually check the metrics for the major snapshots, lately every 1000 kimg (but I should have done it every 100 kimg in my opinion). |
@woctezuma Yeah I am currently freezing the 3 first layers. How many layers are you freezing? In this paper they found 4 as the optimum point for stylegan2: https://arxiv.org/pdf/2002.10964.pdf What metric do you use? For me the fid50k_full takes too much time so I actually modified it to be a fid5k. |
I use I might be wrong though, because I have only skimmed through the paper.
None. I don't monitor metrics at all during the training. I only do it manually afterwards for manually defined milestones. |
generate.py
Closing in favour of the PyTorch implementation here: NVlabs/stylegan2-ada-pytorch#3 |
Hello,
I know you don't accept pull requests. However:
I have added the ability to resume from the latest
.pkl
file with the command-line argument--resume=latest
.The value of
cur_nimg
is inferred from the file name.I have yet to figure out how to automatically compute the relevant value of
aug.strength
to resume from.