-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
I use --resume <Path to .pkl file> to resume learning after stopping.
and i notice few difference with stylegan2:
first, after resuming, after first tick NN begin retraining as if it learns first time. looks like its dont know about previous training.
To Reproduce
Steps to reproduce the behavior:
- In 'root' directory, run command 'python train.py --outdir=./training-runs --data=./dataset/f256.zip --gpus=1 --resume=./training-runs/00012-f256-auto1-kimg10000-batch4-resumecustom/network-snapshot-000600.pkl --batch 4 --kimg 10000 --snap 4 --isnap 2 --nobench=True --metrics=rfid5k'
*here i add isnap arg for images snapshots and metrics rfid5k as redused fid50k for small dataset
- i think that its took number of kimg from file name, but began to doubt when, after initialization, the file name is reset to zero, metrics begin fly up from 125 to 336 for example https://ibb.co/2sT8zY8
in stylegan2 i could set value of iterations in kimg's and its resuming fine.
i found logs from stylegan2, after resuming metrics fine here https://ibb.co/xS52vDC
Expected behavior
I expected resuming as in stylegan2. dont want retrain from beginning.
May be i can set value for resuming from or may be its released, but not documented (idk)
Sorry if I offended you
Desktop (please complete the following information):
- OS: Windows 10
- PyTorch version: 1.7.1
- CUDA toolkit version: CUDA 10.0
- NVIDIA driver version: 461.09
- GPU: RTX2070
- Docker: did you use Docker? No.
Additional context
may be it's part of Ada's work ?
I think you can help me.
P.S. Thx you for your work. PyTorch implementation most likely required a lot of effort from you.
Ada is AWESOME.