Need clarity on total batch size

https://github.com/karpathy/nanochat/blob/c6b7ab744055d5915e6ccb61088de80c10cbaff9/scripts/base_train.py#L46

`total_batch_size` is hardcoded to 524888, which comes from 32 * 2048 * 8 (32 `device_batch_size`, 2048 `max_seq_len`, 8 `world_size`).

In my script, I have modified the batch size and sequence length and the world size is also 1 (single gpu). i.e. All three params making the `total_batch_size` are different. 

@karpathy is there a better way to set `total_batch_size` ? Should it be equal to my `device_batch_size * max_seq_len * world_size` ?

Is there a industry standard for optimal total batch size, as a function of number of parameters ? Similar to how `total_tokens` is calculated ?

I think there is a range of ideal batch size of a model. Like for a d10 model, we shouldn't be using the same `total_batch_size` as d30 model right? Not sure. How is it done currently ? Any research papers for reference ?

Would love to hear your thoughts on this. Aarigatoh!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need clarity on total batch size #285

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Need clarity on total batch size #285

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions