Replace batch_size with global_batch_size. #150

GeorgiosSmyrnis · 2023-12-11T21:45:45Z

This PR deprecates the per gpu batch size from the command line and replaces it with global batch size.

Tests are also updated to account for this.

GeorgiosSmyrnis · 2023-12-11T21:46:25Z

open_lm/params.py

@@ -576,7 +576,7 @@ def parse_args(args):

    if args.val_data is not None and args.val_batch_size is None:
        # if not set explicitly make sure that the val batch size is set to the micro batch size
-
-        args.val_batch_size = args.batch_size // args.accum_freq
+        # TODO: is this correct with global batch size?


This part I'm not 100% sure about, I think it is assumed that we are running eval with only 1 GPU.

I would do

args.per_gpu_batch_size_val = (args.global_batch_size // world_size // args.accum_freq)

and change references to args.val_batch_size to args.per_gpu_batch_size_val (I know, it's a mouthful).

Other than that this LGTM!

jmercat · 2023-12-13T07:31:51Z

I think we might want to consider having a global batch size in terms of tokens instead of in terms of samples. In most work I saw this is the thing that is kept constant regardless of the sequence length… might be unintuitive to the user though… what do you think?

open_lm/main.py

GeorgiosSmyrnis · 2023-12-13T17:15:20Z

@jmercat I think the batch size in terms of sequences is a better choice, since we assume the data to already be tokenized / split into sequences - if we chose the batch size to be in terms of tokens, the only valid choices are multiples of the chosen sequence length, which I think would make things fairly unintuitive.

Maybe we would need to rethink this in the future, but in order to fully support this we would need to essentially resolve #55 first I think.

achalddave

Left one comment + need to sync with main, feel free to merge after that!

sedrick-keh-tri · 2023-12-20T09:12:26Z

Old thread, but just curious. What's the motivation behind switching from per gpu batch size to global batch size?

GeorgiosSmyrnis added 2 commits December 11, 2023 15:42

Replace batch_size with global_batch_size.

2dcf57b

Formatting.

51f889b

GeorgiosSmyrnis commented Dec 11, 2023

View reviewed changes

GeorgiosSmyrnis added 4 commits December 11, 2023 15:47

Version bump.

5a8ceb7

Small test fix.

b08fa08

Update val batch size.

2216941

Bugfix.

214cdae

rom1504 reviewed Dec 13, 2023

View reviewed changes

open_lm/main.py Show resolved Hide resolved

rom1504 reviewed Dec 13, 2023

View reviewed changes

open_lm/main.py Outdated Show resolved Hide resolved

Change handling of accum_freq.

a79032c

achalddave mentioned this pull request Dec 13, 2023

global batch size and val fix #149

Closed

achalddave approved these changes Dec 14, 2023

View reviewed changes

GeorgiosSmyrnis and others added 4 commits December 18, 2023 11:30

Version bump + no 0 bsz.

4ba5422

No 0 bsz in training as well.

903eabd

Merge branch 'main' into gsmyrnis/global_batch_size

18e8a10

Formatting.

9d4f16c

GeorgiosSmyrnis merged commit 41ca9a0 into main Dec 18, 2023
2 checks passed

GeorgiosSmyrnis mentioned this pull request Dec 18, 2023

Fix some extra cases in global batch size. #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace batch_size with global_batch_size. #150

Replace batch_size with global_batch_size. #150

GeorgiosSmyrnis commented Dec 11, 2023

GeorgiosSmyrnis Dec 11, 2023

achalddave Dec 13, 2023

jmercat commented Dec 13, 2023

GeorgiosSmyrnis commented Dec 13, 2023

achalddave left a comment

sedrick-keh-tri commented Dec 20, 2023

Replace batch_size with global_batch_size. #150

Replace batch_size with global_batch_size. #150

Conversation

GeorgiosSmyrnis commented Dec 11, 2023

GeorgiosSmyrnis Dec 11, 2023

Choose a reason for hiding this comment

achalddave Dec 13, 2023

Choose a reason for hiding this comment

jmercat commented Dec 13, 2023

GeorgiosSmyrnis commented Dec 13, 2023

achalddave left a comment

Choose a reason for hiding this comment

sedrick-keh-tri commented Dec 20, 2023