Emerging Optimizers

Overview

Emerging Optimizers is a research project focused on understanding and optimizing the algorithmic behavior of emerging optimizers (including Shampoo, SOAP, Muon, and others) and their implications to performance of GPU systems in LLM training.

⚠️ Note: Emerging-Optimizers is under active development. All APIs are experimental and subject to change. New features, improvements, and documentation updates are released regularly. Your feedback and contributions are welcome, and we encourage you to follow along as new updates roll out.

Background

What are Emerging Optimizers?

Emerging optimizers represent a class of novel optimization algorithms that go beyond traditional first-order methods like Adam or SGD. These include optimizers that use matrix-based (non-diagonal) preconditioning, orthogonalization techniques, and other innovative approaches to achieve faster convergence and improved training efficiency.

Examples include Shampoo, which uses Kronecker-factored preconditioning (arXiv:1802.09568), and Muon, which uses Newton-Schulz orthogonalization (arXiv:2502.16982).

Why They Matter

Emerging optimizers have demonstrated significant practical impact in large-scale language model training. Most notably, Muon was used to train the Kimi K2 model (arXiv:2507.20534), showcasing the effectiveness of these novel approaches at scale. These optimizers can:

Achieve faster convergence, reducing the number of training steps required
Improve final model quality through better conditioning of the optimization landscape
Enable more efficient hyperparameter tuning due to reduced sensitivity to learning rates

Installation

Prerequisites

Python 3.10 or higher, 3.12 is recommended
PyTorch 2.0 or higher

Install from Source

git clone https://github.com/NVIDIA-NeMo/Emerging-Optimizers.git
cd Emerging-Optimizers
pip install .

Usage

Muon Optimizer

Muon (MomentUm Orthogonalized by Newton-schulz) uses orthogonalization for 2D parameters.

For a simple usage example, see tests/test_orthogonalized_optimizer.py::MuonTest.

Integration with Megatron Core

Integration with Megatron Core is in progress. See the integration PR that demonstrates usage with Dense and MoE models.

Benchmarks

Coming soon.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
docker		docker
docs		docs
emerging_optimizers		emerging_optimizers
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Emerging Optimizers

Overview

Background

What are Emerging Optimizers?

Why They Matter

Installation

Prerequisites

Install from Source

Usage

Muon Optimizer

Integration with Megatron Core

Benchmarks

License

About

Uh oh!

Packages

Contributors 9

Uh oh!

Languages

License

NVIDIA-NeMo/Emerging-Optimizers

Folders and files

Latest commit

History

Repository files navigation

Emerging Optimizers

Overview

Background

What are Emerging Optimizers?

Why They Matter

Installation

Prerequisites

Install from Source

Usage

Muon Optimizer

Integration with Megatron Core

Benchmarks

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Packages 0

Contributors 9

Uh oh!

Languages

Packages