Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace batch_size with global_batch_size. #150

Merged
merged 11 commits into from
Dec 18, 2023

Conversation

GeorgiosSmyrnis
Copy link
Collaborator

This PR deprecates the per gpu batch size from the command line and replaces it with global batch size.

Tests are also updated to account for this.

@@ -576,7 +576,7 @@ def parse_args(args):

if args.val_data is not None and args.val_batch_size is None:
# if not set explicitly make sure that the val batch size is set to the micro batch size

args.val_batch_size = args.batch_size // args.accum_freq
# TODO: is this correct with global batch size?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I'm not 100% sure about, I think it is assumed that we are running eval with only 1 GPU.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do

args.per_gpu_batch_size_val = (args.global_batch_size // world_size // args.accum_freq)

and change references to args.val_batch_size to args.per_gpu_batch_size_val (I know, it's a mouthful).

Other than that this LGTM!

@jmercat
Copy link
Collaborator

jmercat commented Dec 13, 2023

I think we might want to consider having a global batch size in terms of tokens instead of in terms of samples. In most work I saw this is the thing that is kept constant regardless of the sequence length… might be unintuitive to the user though… what do you think?

open_lm/main.py Outdated Show resolved Hide resolved
@GeorgiosSmyrnis
Copy link
Collaborator Author

@jmercat I think the batch size in terms of sequences is a better choice, since we assume the data to already be tokenized / split into sequences - if we chose the batch size to be in terms of tokens, the only valid choices are multiples of the chosen sequence length, which I think would make things fairly unintuitive.

Maybe we would need to rethink this in the future, but in order to fully support this we would need to essentially resolve #55 first I think.

Copy link
Collaborator

@achalddave achalddave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment + need to sync with main, feel free to merge after that!

@GeorgiosSmyrnis GeorgiosSmyrnis merged commit 41ca9a0 into main Dec 18, 2023
2 checks passed
@sedrick-keh-tri
Copy link
Collaborator

Old thread, but just curious. What's the motivation behind switching from per gpu batch size to global batch size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants