Replies: 1 comment 1 reply
-
Hey @long21wt, storing per-example gradients sounds costly, try to reduce the batch size until it fits in memory.
The naive way would be to run |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi
I have an OOM problem when I try to get per example gradient with
vmap
, I wonder is there another way to do it without usingvmap
? A bit of context over it, I'm usingvmap
+ my own gradient accumulation by replacing state.Thanks
Beta Was this translation helpful? Give feedback.
All reactions