add grad accum support #151

lchu6 · 2025-06-18T15:05:46Z

No description provided.

lchu6 · 2025-06-18T15:09:54Z

@daviswer I think the reporting should be correct but help me double check as I typed really quick.

ddp_stats[1] stores gnorm and we add only every grad_accum_steps.
ddp_stats[2] stores "denominator" and we only count "real steps".
ddp_stats[0] stores loss and we count all steps (so grad_accum_steps times more value than above 1 and 2), but since our loss was anyway loss = loss / grad_accum_steps, so this value should also be correct with above denominator.

So if we have a total steps of 15 with grad accum step = 3:
we have 5 values for gnorm and denominator=5, while 15 values for loss. but those 15 values of loss was already original_loss/3, so it pair with denominator=5.

daviswer

That logic checks out, looks good!

add grad accum support

6b27464

lchu6 requested review from daviswer and shiqiangw June 18, 2025 15:05

daviswer approved these changes Jun 18, 2025

View reviewed changes

shiqiangw merged commit 3f959ee into mamba-tiktoken Jun 18, 2025
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add grad accum support #151

add grad accum support #151

Uh oh!

lchu6 commented Jun 18, 2025

Uh oh!

lchu6 commented Jun 18, 2025 •

edited

Loading

Uh oh!

daviswer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

add grad accum support #151

add grad accum support #151

Uh oh!

Conversation

lchu6 commented Jun 18, 2025

Uh oh!

lchu6 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daviswer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lchu6 commented Jun 18, 2025 •

edited

Loading