Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements from Moonlight #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ehartford
Copy link

MoonshotAI suggested some improvements to Muon in training Moonlight.

https://github.com/MoonshotAI/Moonlight/blob/master/examples/toy_train.py

Perhaps they would make a good addition to Muon?

@toothacher17
Copy link

toothacher17 commented Feb 27, 2025

hi, @ehartford thanks for mentioning it! To be fair, I think the core ideas of training Moonlight are:

  1. weight decay
  2. adjusting update rms by matrix shape
  3. matching to AdamW RMS

For point 1 and 2, Keller's current impl should already contain it (as we mentioned in the paper) under the setting of nanogpt. For point 3, it is mostly designed for large scale over-train setting and might not be the best setting under the nanogpt speedrun (small scale of model, small scale of tokens)

@lin72h
Copy link

lin72h commented Mar 4, 2025

@toothacher17 Nice suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants