Need clarity on total batch size #287
Unanswered
nitishpandey04
asked this question in
Q&A
Replies: 1 comment
-
|
I'm not sure if it matters that much if you don't specify |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
nanochat/scripts/base_train.py
Line 46 in c6b7ab7
total_batch_sizeis hardcoded to 524888, which comes from 32 * 2048 * 8 (32device_batch_size, 2048max_seq_len, 8world_size).In my script, I have modified the batch size and sequence length and the world size is also 1 (single gpu). i.e. All three params making the
total_batch_sizeare different.@karpathy is there a better way to set
total_batch_size? Should it be equal to mydevice_batch_size * max_seq_len * world_size?Is there a industry standard for optimal total batch size, as a function of number of parameters ? Similar to how
total_tokensis calculated ?I think there is a range of ideal batch size of a model. Like for a d10 model, we shouldn't be using the same
total_batch_sizeas d30 model right? Not sure. How is it done currently ? Any research papers for reference ?Would love to hear your thoughts on this. Aarigatoh!
Beta Was this translation helpful? Give feedback.
All reactions