-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training resumes not as expected #81
Comments
@nurpax i see you are actively answering the questions, can you help me? |
What augment values are you getting? I had similar problems after resuming training, the output images were distorted (rotated and with weird colors). First I thought this was somehow caused by the training resuming from kimg = 0 like you mentioned, so I changed the code so that it reads the initial kimg value from the .pkl file name. Then I noticed that the augmentation parameter was increasing without limit, which probably caused the augmentations to leak to the output images, leading to the distorted images. When I changed to fixed augmentation, the problem went away. |
@jpkos, Yeah same problems with augmentation value. I tried fixed aug, but my model didnt want to train or didnt train so fast (May be its problem because I have small dataset) i ajust augmentation parameters and turn off rotate90, lumaflip. Images stop rotating, but after 600kimgs Images became very Green and bright. Looks like its problem with endless increasting aug value. Idk how did NVlabs do it, I have no same beast GPUs to do a lot of test and reproduse their result |
|
|
I use |
as I understand it, in Stylegan2 (previous version) there was a "resume_kimg" option for the training_loop.py which was not really used by train.py You can easily implement this though: just add resume_kimg as a parameter to train.py (with default 0) and pass it over to training_loop.py and then set cur_nimg = int(resume_kimg * 1000) instead of cur_nimg = 0 in training_loop.py |
Your fix is what was done here: 64efea2 |
@woctezuma |
My Gan crashed and I was extremely annoyed as I was experiencing the exact same issue so I decided to read into the code. Setting the inital augmentation and kimg will not actually continue the training from when it last ran. The Dev's don't seem to care if it does crash as there is no proper resume code, I was actually able to modify the code and create a perfect resume function, however, I will not be able to resume from my first Gan as I did not have my code added yet so there is no way to pull the settings needed, but at least for future I will be all good and have everything stored in the pickle file. |
Describe the bug
I use --resume <Path to .pkl file> to resume learning after stopping.
and i notice few difference with stylegan2:
first, after resuming, after first tick NN begin retraining as if it learns first time. looks like its dont know about previous training.
To Reproduce
Steps to reproduce the behavior:
*here i add isnap arg for images snapshots and metrics rfid5k as redused fid50k for small dataset
in stylegan2 i could set value of iterations in kimg's and its resuming fine.
i found logs from stylegan2, after resuming metrics fine here https://ibb.co/xS52vDC
Expected behavior
I expected resuming as in stylegan2. dont want retrain from beginning.
May be i can set value for resuming from or may be its released, but not documented (idk)
Sorry if I offended you
Desktop (please complete the following information):
Additional context
may be it's part of Ada's work ?
I think you can help me.
P.S. Thx you for your work. PyTorch implementation most likely required a lot of effort from you.
Ada is AWESOME.
The text was updated successfully, but these errors were encountered: