Skip to content

Commit

Permalink
Update tulu3.md
Browse files Browse the repository at this point in the history
  • Loading branch information
vwxyzjn authored Dec 3, 2024
1 parent 1017c7c commit e363290
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions docs/tulu3.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,18 @@ accelerate launch \
# For Ai2 internal members, this was the experiment URL: https://beaker.org/ex/01JBNTPW8TKG09B2XR832YB5S8
```

> [!NOTE]
> If you have different number of GPUs, please adjust the `NUM_MACHINES`, `NUM_PROCESSES`, `PER_DEVICE_TRAIN_BATCH_SIZE`, and `GRADIENT_ACCUMULATION_STEPS` accordingly. For example, say, you only have 8 GPUs. The command below has an effective batch size of `NUM_PROCESSES * PER_DEVICE_TRAIN_BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS = 64 * 1 * 2 = 128`. A one node setup can simulate our batch size with `NUM_PROCESSES=8`, `PER_DEVICE_TRAIN_BATCH_SIZE=1`, and `GRADIENT_ACCUMULATION_STEPS=64`.
> [!NOTE]
> If you have different number of GPUs, please adjust the `NUM_MACHINES`, `NUM_PROCESSES`, `PER_DEVICE_TRAIN_BATCH_SIZE`, and `GRADIENT_ACCUMULATION_STEPS` accordingly to reproduce the same effective batch size.
> The effective batch size is calculated by multiplying:
> - Number of GPUs / processes (NUM_PROCESSES)
> - Train batch size per GPU (PER_DEVICE_TRAIN_BATCH_SIZE)
> - Gradient accumulation steps (GRADIENT_ACCUMULATION_STEPS)
> so we have
> ```
> 64 GPUs: 64 * 1 * 2 = 128 # from the example above
> 8 GPUs: 8 * 1 * 16 = 128 # if you only
> ```
> You can achieve the same effective batch size with fewer GPUs by increasing gradient accumulation steps proportionally (e.g., `NUM_PROCESSES=8, PER_DEVICE_TRAIN_BATCH_SIZE=1, and GRADIENT_ACCUMULATION_STEPS=16`)
### Llama-3.1-Tulu-3-70B-SFT Reproduction
Expand Down Expand Up @@ -342,4 +351,4 @@ source configs/beaker_configs/ray_node_setup.sh && python open_instruct/ppo_vllm
--gradient_checkpointing \
--with_tracking
# For Ai2 internal members, this was the experiment URL: https://beaker.org/ex/01JD3YEM4XGH2F2H10Y49GK441/
```
```

0 comments on commit e363290

Please sign in to comment.