Rename MambaModel/MambaStack to HybridModel/HybridStack by Phlip79 · Pull Request #4099 · NVIDIA/Megatron-LM

Phlip79 · 2026-04-01T21:30:39Z

Summary

Rename MambaModel → HybridModel, MambaStack → HybridStack, MambaStackSubmodules → HybridStackSubmodules since these classes support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern and are not Mamba-specific
Rename mamba_stack_spec → hybrid_stack_spec, mamba_inference_stack_spec → hybrid_inference_stack_spec, mamba_builder() → hybrid_builder(), modelopt_gpt_mamba_builder() → modelopt_gpt_hybrid_builder(), and related function names
Move canonical files to megatron/core/models/hybrid/ (hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py, hybrid_layer_allocation.py)
Rename class-named scripts and directories: pretrain_mamba.py → pretrain_hybrid.py, mamba_builders.py → hybrid_builders.py, tools/run_mamba_text_generation_server*.py → run_hybrid_*, modelopt/mamba/ → modelopt/hybrid/
Add backward-compatible aliases (MambaModel = HybridModel, etc.) and re-export stubs at old import paths so existing code continues to work
MambaModel is a thin subclass of HybridModel that accepts the deprecated mamba_stack_spec kwarg and forwards it as hybrid_stack_spec
Add deprecation warnings for MambaModel, --export-model-type "MambaModel", and --model-provider "mamba"
Mamba-specific classes (MambaLayer, MambaMixer, MambaContextParallel, etc.) are unchanged since they are actual Mamba SSM implementations
Model-named files stay as-is: examples/mamba/, recipes/h100/mamba*.yaml, test_mamba.py (dist_checkpointing)
megatron/core/models/hybrid/__init__.py is intentionally empty to avoid a circular import: hybrid_layer_allocation is imported during megatron.core init, and eagerly importing hybrid_model from __init__.py would create a circular dependency. Import HybridModel from megatron.core.models.hybrid.hybrid_model instead.

Tests

Functional tests

copy-pr-bot · 2026-04-01T21:30:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-01T21:30:48Z

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

Add the oncall reviewer (optional reviewer)
Add required review teams based on your changes

See the contribution guide for more details.

Phlip79 · 2026-04-01T21:32:50Z

/ok to test 360b582

Phlip79 · 2026-04-01T22:58:14Z

/ok to test c7dec8a

Phlip79 · 2026-04-02T00:22:03Z

/claude review

tests/unit_tests/inference/engines/test_hybrid_prefix_caching_e2e.py

pretrain_hybrid.py

megatron/core/models/hybrid/hybrid_model.py

megatron/core/models/hybrid/hybrid_block.py

claude

Clean mechanical rename with proper backward-compatible aliases and re-exports. Left a few minor nits on stale references to 'MambaBlock' in comments and a '2026-2026' copyright typo, but nothing blocking.

Phlip79 · 2026-04-02T00:29:03Z

/claude review

megatron/core/models/hybrid/hybrid_model.py

claude

Clean rename with good backward-compatible re-exports at the old import paths. One issue: the MambaModel = HybridModel alias doesn't cover the renamed mamba_stack_spec → hybrid_stack_spec keyword parameter in __init__, so existing callers using MambaModel(mamba_stack_spec=...) will break with a TypeError. See inline comment for a suggested fix.

megatron/core/models/hybrid/hybrid_layer_specs.py

megatron/post_training/arguments.py

megatron/inference/utils.py

megatron/core/ssm/mamba_hybrid_layer_allocation.py

Phlip79 · 2026-04-02T15:55:58Z

/ok to test f115ba7

…idStack/HybridStackSubmodules These classes support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern, so "Hybrid" better reflects their purpose. Mamba-specific classes (MambaLayer, MambaMixer, MambaContextParallel, etc.) are unchanged. File renames: - megatron/core/models/mamba/mamba_model.py -> megatron/core/models/hybrid/hybrid_model.py - megatron/core/ssm/mamba_block.py -> megatron/core/models/hybrid/hybrid_block.py - megatron/core/models/mamba/mamba_layer_specs.py -> megatron/core/models/hybrid/hybrid_layer_specs.py - tests/unit_tests/models/test_mamba_model.py -> tests/unit_tests/models/test_hybrid_model.py - tests/unit_tests/ssm/test_mamba_block.py -> tests/unit_tests/ssm/test_hybrid_block.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Thin re-export modules at the original file locations so that existing code using the old import paths continues to work: - megatron.core.models.mamba.mamba_model -> megatron.core.models.hybrid.hybrid_model - megatron.core.models.mamba.mamba_layer_specs -> megatron.core.models.hybrid.hybrid_layer_specs - megatron.core.ssm.mamba_block -> megatron.core.models.hybrid.hybrid_block Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

File/directory renames (named after the class, not the model): - mamba_builders.py -> hybrid_builders.py - pretrain_mamba.py -> pretrain_hybrid.py - tools/run_mamba_text_generation_server*.py -> run_hybrid_* - megatron/core/post_training/modelopt/mamba/ -> modelopt/hybrid/ - test_mamba_moe_model.py -> test_hybrid_moe_model.py - test_mamba_model_expert_parallel_inference.py -> test_hybrid_* - test_mamba_prefix_caching_e2e.py -> test_hybrid_prefix_caching_e2e.py Function renames: - mamba_builder() -> hybrid_builder() - modelopt_gpt_mamba_builder() -> modelopt_gpt_hybrid_builder() Files named after the model/architecture (kept as mamba): - examples/mamba/ (like examples/llama/, examples/gpt3/) - tests/test_utils/recipes/h100/mamba*.yaml - tests/unit_tests/dist_checkpointing/models/test_mamba.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…nd tests Same pattern as the model_type fix: the string identifier selects the hybrid_builder, so it should be "hybrid" not "mamba". Backward compat preserved in argparse choices. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix "2026-2026" → "2026" in 3 copyright headers - Fix docstring "Pretrain and SFT Mamba" → "Pretrain and SFT Hybrid" - Fix stale "MambaBlock" references in comments → "HybridStack" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Instead of a simple alias, MambaModel is now a thin subclass of HybridModel that intercepts the deprecated mamba_stack_spec keyword argument and forwards it as hybrid_stack_spec. This ensures existing code calling MambaModel(mamba_stack_spec=...) continues to work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- MambaModel.__init__ now warns that MambaModel is deprecated - --export-model-type "MambaModel" emits a DeprecationWarning - --model-provider "mamba" emits a DeprecationWarning - Updated help text on both CLI args to note the deprecation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This file handles hybrid layer pattern parsing and pipeline segment selection, which is HybridModel/HybridStack infrastructure rather than an SSM component. - megatron/core/ssm/mamba_hybrid_layer_allocation.py -> megatron/core/models/hybrid/hybrid_layer_allocation.py - test_mamba_hybrid_layer_allocation.py -> test_hybrid_layer_allocation.py Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- modelopt_gpt_mamba_builder = modelopt_gpt_hybrid_builder - mamba_builder = hybrid_builder These were the only renamed public symbols without backward-compat aliases, unlike MambaModel, MambaStack, mamba_stack_spec, etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The hybrid __init__.py eagerly imported HybridStack from hybrid_block, which pulls in hybrid_layer_allocation and widens the import graph enough to cause a circular import with megatron.core. The old mamba/__init__.py only exported MambaModel, so match that pattern. Users import HybridStack directly from megatron.core.models.hybrid.hybrid_block as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-03T02:40:09Z

/ok to test b4248b7

Importing any module from the hybrid package triggers __init__.py. When hybrid_layer_allocation is imported during megatron.core init (via dynamic_context.py), eagerly importing hybrid_model.py from __init__.py creates a circular dependency back to megatron.core. Fix: make __init__.py empty and update all consumers to import directly from hybrid.hybrid_model instead of the package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-03T02:55:33Z

/ok to test 2514cd7

Phlip79 · 2026-04-03T03:04:23Z

/claude review

tools/run_text_generation_server.py

claude

One bug found: tools/run_text_generation_server.py doesn't accept model_type="mamba" (raises ValueError), unlike megatron/inference/utils.py which correctly handles both "hybrid" and "mamba" with a deprecation warning. See inline comment for the fix.

Everything else looks good — re-export stubs, backward-compat aliases, MambaModel subclass, deprecation warnings, and test updates are all consistent.

Matches the backward-compat pattern used in megatron/inference/utils.py for --model-provider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Phlip79 · 2026-04-03T03:33:58Z

/ok to test e3ed806

Phlip79 · 2026-04-03T03:34:04Z

/claude review

claude

LGTM

Phlip79 requested review from a team as code owners April 1, 2026 21:30

svcnvidia-nemo-ci marked this pull request as draft April 1, 2026 21:30

svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 1, 2026

copy-pr-bot bot temporarily deployed to test April 1, 2026 22:59 Inactive