Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training on own dataset but start getting black images after "sample-10" #101

Open
twentyfiveYang opened this issue Oct 4, 2022 · 2 comments

Comments

@twentyfiveYang
Copy link

Thank you sharing the wonderful code!
I was using my own dataset for training but start getting black images after 9000 steps. Anyone knows how to fix this situation?
Here is my training script and results:

from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
).cuda()

diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    timesteps = 1000,           # number of steps
    sampling_timesteps = 250,   # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper])
    loss_type = 'l1'            # L1 or L2
).cuda()

trainer = Trainer(
    diffusion,
    'G:/code3/DDPM-main/data/train/',
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True                        # turn on mixed precision
)

trainer.train()

图片

@twentyfiveYang twentyfiveYang changed the title training on own dataset but start to gett black images after "sample-10" training on own dataset but start getting black images after "sample-10" Oct 4, 2022
@ruochiz
Copy link

ruochiz commented Oct 4, 2022

Do you see nan loss during the training? Based on what I've tested, if you turn off amp=True, it resolves nan loss under certain cases (with the cost of being slower.)

@twentyfiveYang
Copy link
Author

Do you see nan loss during the training? Based on what I've tested, if you turn off amp=True, it resolves nan loss under certain cases (with the cost of being slower.)

thank you very much for your reply. It solved my problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants