Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tuning losses #89

Closed
snakch opened this issue Apr 9, 2021 · 5 comments
Closed

Fine tuning losses #89

snakch opened this issue Apr 9, 2021 · 5 comments

Comments

@snakch
Copy link

snakch commented Apr 9, 2021

Hello, Thank you for making this code base open-source, it's great!

I'm having the following issue: I'm fine-tuning the ffhq model on my own dataset. Since I'm training on colab, I have to do this piecewise, so I end up training as long as possible, then restarting from the latest snapshot.

The problem is that when I look at the losses, they seem to start from scratch every time. I includea screnshot of losses for two subsequent runs. I call train.py with the following arguments (other than the snapshot and data paths)

--aupipe=bg --gamma=10 --cfg=paper256 --mirror=1 --snap=10 --metrics=none

Is this normal would you say? What's then the best way of getting a sense of progress (other than manually inspecting outputs)? Thanks!

Screenshot from 2021-04-09 09-25-05

@ink1
Copy link

ink1 commented Apr 9, 2021

I don't think it is normal (I'm in the same boat). What I noticed is that if I improve FID from, say, 100 to 50, it jumps back to 90 or so upon resume. To be clear, FID will be 50 immediately on resume but then veer off to 90 in several ticks and only then will start to gradually subside. So I strongly suspect this behaviour is due to training schedule, Adam or both. In stylegan2-ada, you could start with kimg 10000 or whatever thus entering fine-tuning regime. If anyone knows how to tweak the training schedule, please share! Thanks

@ink1
Copy link

ink1 commented Apr 10, 2021

Sorry, meant to say in StyleGAN2 rather than StyleGAN2-ada
https://github.com/NVlabs/stylegan2/blob/master/training/training_loop.py#L132
which then go into the training_schedule
https://github.com/NVlabs/stylegan2/blob/master/training/training_loop.py#L47
But StyleGAN2-ada incl the pytorch version has no such thing as training schedule.
#3 is trying to address some resume issues but I am not convinced it addresses the main one: immediate divergence on resume.

@snakch
Copy link
Author

snakch commented Apr 10, 2021

Ok nice find! I actually think the above has a good chance of fixing the issue.

I suspect that the immediate divergence you're seeing is due to the augment strength being reset to 0 when resuming. The PR above seems to fix that. I'm going to give it a go and see where it takes me.

@ink1
Copy link

ink1 commented Apr 10, 2021

I doubt that changing the augmentation strength is going to help in this case. But it is easy to test.
Perhaps EMA rampup could be more useful.

@snakch
Copy link
Author

snakch commented Apr 15, 2021

Just as a heads up - playing around with #3, I seem to be getting much better results, whether it's because of augmentation strength or one of the other changes.

@snakch snakch closed this as completed Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants