You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The error message above made me think that the issue was perhaps due to gradient checkpointing. My hunch was right, and turning it off (both with and without ddp_args['find_unused_parameters'] = True now works with the above settings.
Thought I'd report anyways because it's unclear to me if this is a bug or those two settings are incompatible.
I'm yet to measure speed, but the only drawback for turning off checkpointing is smaller batch sizes, which I'm able to work around with the accum-freq arg.
The text was updated successfully, but these errors were encountered:
I run into a DDP error when trying to freeze part of the image encoder. This is how I'm launching my training run:
The error I ran into is:
I tried adding
ddp_args['find_unused_parameters'] = True
to the script to try and combat this, but that yielded a different error:The error message above made me think that the issue was perhaps due to gradient checkpointing. My hunch was right, and turning it off (both with and without
ddp_args['find_unused_parameters'] = True
now works with the above settings.Thought I'd report anyways because it's unclear to me if this is a bug or those two settings are incompatible.
I'm yet to measure speed, but the only drawback for turning off checkpointing is smaller batch sizes, which I'm able to work around with the
accum-freq
arg.The text was updated successfully, but these errors were encountered: