Google colab speedruns #82

MichelNivard · 2025-02-19T14:16:21Z

MichelNivard
Feb 19, 2025

Hi all,

I have just started modding the repo (specific commit: d6a7f06) for speedruns on google colab.

see: https://github.com/MichelNivard/modded-nanogpt

So far to make things run at all I have has to set fp8=False on line 330 (A100 doesnt do fp8). More obviously I have had to set world_size = 1 and in the run sript I have set nporc_per_node = 1.

This lead to a pretty serious setback in validatino los after a set number of steps compaed with recent results.

for ex at step 1375:

Colab w A100:
step:1375/1770 val_loss:3.9751 train_time:1117031ms step_avg:812.39ms

A recent record (Sub 3 Min):
step:1375/1393 val_loss:3.2820 train_time:177070ms step_avg:129.72ms

Any ideas on how to claw some of that back? Clearly setting FP8 to False interactos with other mods and effects the model performance not just speed.

I'll add that the duing training the GPU is under strong memory preassure right now, so hints on how to tweak the batch size would be appreciated as well!

MarktHart · 2025-02-19T17:29:03Z

MarktHart
Feb 19, 2025

You need to implement gradient accumulation. IIRC setting the device amount changes the effective batch size.

This should work to make the loss reproducible: #29 (comment)

1 reply

MichelNivard Feb 20, 2025
Author

Hey thanks that did the trick! The gradient accumulation is also very useful for my 1.8B model size run (the full GPT2) on 8xH100!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google colab speedruns #82

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Google colab speedruns #82

MichelNivard Feb 19, 2025

Replies: 1 comment · 1 reply

MarktHart Feb 19, 2025

MichelNivard Feb 20, 2025 Author

MichelNivard
Feb 19, 2025

Replies: 1 comment 1 reply

MarktHart
Feb 19, 2025

MichelNivard Feb 20, 2025
Author