-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace batch_size with global_batch_size. #150
Conversation
open_lm/params.py
Outdated
@@ -576,7 +576,7 @@ def parse_args(args): | |||
|
|||
if args.val_data is not None and args.val_batch_size is None: | |||
# if not set explicitly make sure that the val batch size is set to the micro batch size | |||
|
|||
args.val_batch_size = args.batch_size // args.accum_freq | |||
# TODO: is this correct with global batch size? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I'm not 100% sure about, I think it is assumed that we are running eval with only 1 GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do
args.per_gpu_batch_size_val = (args.global_batch_size // world_size // args.accum_freq)
and change references to args.val_batch_size
to args.per_gpu_batch_size_val
(I know, it's a mouthful).
Other than that this LGTM!
I think we might want to consider having a global batch size in terms of tokens instead of in terms of samples. In most work I saw this is the thing that is kept constant regardless of the sequence length… might be unintuitive to the user though… what do you think? |
@jmercat I think the batch size in terms of sequences is a better choice, since we assume the data to already be tokenized / split into sequences - if we chose the batch size to be in terms of tokens, the only valid choices are multiples of the chosen sequence length, which I think would make things fairly unintuitive. Maybe we would need to rethink this in the future, but in order to fully support this we would need to essentially resolve #55 first I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one comment + need to sync with main, feel free to merge after that!
Old thread, but just curious. What's the motivation behind switching from per gpu batch size to global batch size? |
This PR deprecates the per gpu batch size from the command line and replaces it with global batch size.
Tests are also updated to account for this.