Skip to content

Rename MambaModel/MambaStack to HybridModel/HybridStack#4099

Open
Phlip79 wants to merge 18 commits intoNVIDIA:mainfrom
Phlip79:philip/rename-to-hybrid
Open

Rename MambaModel/MambaStack to HybridModel/HybridStack#4099
Phlip79 wants to merge 18 commits intoNVIDIA:mainfrom
Phlip79:philip/rename-to-hybrid

Conversation

@Phlip79
Copy link
Copy Markdown
Member

@Phlip79 Phlip79 commented Apr 1, 2026

Summary

  • Rename MambaModelHybridModel, MambaStackHybridStack, MambaStackSubmodulesHybridStackSubmodules since these classes support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP) via hybrid_layer_pattern and are not Mamba-specific
  • Rename mamba_stack_spechybrid_stack_spec, mamba_inference_stack_spechybrid_inference_stack_spec, mamba_builder()hybrid_builder(), modelopt_gpt_mamba_builder()modelopt_gpt_hybrid_builder(), and related function names
  • Move canonical files to megatron/core/models/hybrid/ (hybrid_model.py, hybrid_block.py, hybrid_layer_specs.py, hybrid_layer_allocation.py)
  • Rename class-named scripts and directories: pretrain_mamba.pypretrain_hybrid.py, mamba_builders.pyhybrid_builders.py, tools/run_mamba_text_generation_server*.pyrun_hybrid_*, modelopt/mamba/modelopt/hybrid/
  • Add backward-compatible aliases (MambaModel = HybridModel, etc.) and re-export stubs at old import paths so existing code continues to work
  • MambaModel is a thin subclass of HybridModel that accepts the deprecated mamba_stack_spec kwarg and forwards it as hybrid_stack_spec
  • Add deprecation warnings for MambaModel, --export-model-type "MambaModel", and --model-provider "mamba"
  • Mamba-specific classes (MambaLayer, MambaMixer, MambaContextParallel, etc.) are unchanged since they are actual Mamba SSM implementations
  • Model-named files stay as-is: examples/mamba/, recipes/h100/mamba*.yaml, test_mamba.py (dist_checkpointing)
  • megatron/core/models/hybrid/__init__.py is intentionally empty to avoid a circular import: hybrid_layer_allocation is imported during megatron.core init, and eagerly importing hybrid_model from __init__.py would create a circular dependency. Import HybridModel from megatron.core.models.hybrid.hybrid_model instead.

Tests

Functional tests

@Phlip79 Phlip79 requested review from a team as code owners April 1, 2026 21:30
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

This PR has been automatically converted to draft because all PRs must start as drafts.

When you are ready for review, click Ready for Review to begin the review process. This will:

  1. Add the oncall reviewer (optional reviewer)
  2. Add required review teams based on your changes

See the contribution guide for more details.

@svcnvidia-nemo-ci svcnvidia-nemo-ci marked this pull request as draft April 1, 2026 21:30
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 1, 2026

/ok to test 360b582

@svcnvidia-nemo-ci svcnvidia-nemo-ci added this to the Core 0.16 milestone Apr 1, 2026
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 1, 2026

/ok to test c7dec8a

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 2, 2026

/claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean mechanical rename with proper backward-compatible aliases and re-exports. Left a few minor nits on stale references to 'MambaBlock' in comments and a '2026-2026' copyright typo, but nothing blocking.

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 2, 2026

/claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean rename with good backward-compatible re-exports at the old import paths. One issue: the MambaModel = HybridModel alias doesn't cover the renamed mamba_stack_spechybrid_stack_spec keyword parameter in __init__, so existing callers using MambaModel(mamba_stack_spec=...) will break with a TypeError. See inline comment for a suggested fix.

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 2, 2026

/ok to test f115ba7

@Phlip79 Phlip79 marked this pull request as ready for review April 3, 2026 02:07
@Phlip79 Phlip79 requested a review from a team as a code owner April 3, 2026 02:07
Phlip79 and others added 16 commits April 3, 2026 02:37
…idStack/HybridStackSubmodules

These classes support multiple layer types (Mamba SSM, Attention, MoE, GDN, MLP)
via hybrid_layer_pattern, so "Hybrid" better reflects their purpose. Mamba-specific
classes (MambaLayer, MambaMixer, MambaContextParallel, etc.) are unchanged.

File renames:
- megatron/core/models/mamba/mamba_model.py -> megatron/core/models/hybrid/hybrid_model.py
- megatron/core/ssm/mamba_block.py -> megatron/core/models/hybrid/hybrid_block.py
- megatron/core/models/mamba/mamba_layer_specs.py -> megatron/core/models/hybrid/hybrid_layer_specs.py
- tests/unit_tests/models/test_mamba_model.py -> tests/unit_tests/models/test_hybrid_model.py
- tests/unit_tests/ssm/test_mamba_block.py -> tests/unit_tests/ssm/test_hybrid_block.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Thin re-export modules at the original file locations so that existing
code using the old import paths continues to work:
- megatron.core.models.mamba.mamba_model -> megatron.core.models.hybrid.hybrid_model
- megatron.core.models.mamba.mamba_layer_specs -> megatron.core.models.hybrid.hybrid_layer_specs
- megatron.core.ssm.mamba_block -> megatron.core.models.hybrid.hybrid_block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
File/directory renames (named after the class, not the model):
- mamba_builders.py -> hybrid_builders.py
- pretrain_mamba.py -> pretrain_hybrid.py
- tools/run_mamba_text_generation_server*.py -> run_hybrid_*
- megatron/core/post_training/modelopt/mamba/ -> modelopt/hybrid/
- test_mamba_moe_model.py -> test_hybrid_moe_model.py
- test_mamba_model_expert_parallel_inference.py -> test_hybrid_*
- test_mamba_prefix_caching_e2e.py -> test_hybrid_prefix_caching_e2e.py

Function renames:
- mamba_builder() -> hybrid_builder()
- modelopt_gpt_mamba_builder() -> modelopt_gpt_hybrid_builder()

Files named after the model/architecture (kept as mamba):
- examples/mamba/ (like examples/llama/, examples/gpt3/)
- tests/test_utils/recipes/h100/mamba*.yaml
- tests/unit_tests/dist_checkpointing/models/test_mamba.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nd tests

Same pattern as the model_type fix: the string identifier selects the
hybrid_builder, so it should be "hybrid" not "mamba". Backward compat
preserved in argparse choices.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix "2026-2026" → "2026" in 3 copyright headers
- Fix docstring "Pretrain and SFT Mamba" → "Pretrain and SFT Hybrid"
- Fix stale "MambaBlock" references in comments → "HybridStack"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of a simple alias, MambaModel is now a thin subclass of
HybridModel that intercepts the deprecated mamba_stack_spec keyword
argument and forwards it as hybrid_stack_spec. This ensures existing
code calling MambaModel(mamba_stack_spec=...) continues to work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- MambaModel.__init__ now warns that MambaModel is deprecated
- --export-model-type "MambaModel" emits a DeprecationWarning
- --model-provider "mamba" emits a DeprecationWarning
- Updated help text on both CLI args to note the deprecation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file handles hybrid layer pattern parsing and pipeline segment
selection, which is HybridModel/HybridStack infrastructure rather
than an SSM component.

- megatron/core/ssm/mamba_hybrid_layer_allocation.py -> megatron/core/models/hybrid/hybrid_layer_allocation.py
- test_mamba_hybrid_layer_allocation.py -> test_hybrid_layer_allocation.py

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- modelopt_gpt_mamba_builder = modelopt_gpt_hybrid_builder
- mamba_builder = hybrid_builder

These were the only renamed public symbols without backward-compat
aliases, unlike MambaModel, MambaStack, mamba_stack_spec, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hybrid __init__.py eagerly imported HybridStack from hybrid_block,
which pulls in hybrid_layer_allocation and widens the import graph
enough to cause a circular import with megatron.core. The old
mamba/__init__.py only exported MambaModel, so match that pattern.

Users import HybridStack directly from
megatron.core.models.hybrid.hybrid_block as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79 Phlip79 force-pushed the philip/rename-to-hybrid branch from 0a2ca71 to b4248b7 Compare April 3, 2026 02:39
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 3, 2026

/ok to test b4248b7

Importing any module from the hybrid package triggers __init__.py.
When hybrid_layer_allocation is imported during megatron.core init
(via dynamic_context.py), eagerly importing hybrid_model.py from
__init__.py creates a circular dependency back to megatron.core.

Fix: make __init__.py empty and update all consumers to import
directly from hybrid.hybrid_model instead of the package.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 3, 2026

/ok to test 2514cd7

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 3, 2026

/claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug found: tools/run_text_generation_server.py doesn't accept model_type="mamba" (raises ValueError), unlike megatron/inference/utils.py which correctly handles both "hybrid" and "mamba" with a deprecation warning. See inline comment for the fix.

Everything else looks good — re-export stubs, backward-compat aliases, MambaModel subclass, deprecation warnings, and test updates are all consistent.

Matches the backward-compat pattern used in megatron/inference/utils.py
for --model-provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 3, 2026

/ok to test e3ed806

@Phlip79
Copy link
Copy Markdown
Member Author

Phlip79 commented Apr 3, 2026

/claude review

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants