Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ The following is the list of models supported by MCore-Bridge:
| Series | model_type |
| -------- | ------------------------------------------------------------ |
| Qwen | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next |
| DeepSeek | deepseek_v3, deepseek_v32 |
| DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR adds deepseek_v4 to the list of supported models, but the actual implementation appears to be missing. The file src/mcore_bridge/model/gpts/deepseek_v4.py is empty in the provided context, and there are no changes to model registration or configuration logic to support this new model type. Please ensure the implementation is included or clarify if it relies on an existing model type.

| GLM | glm4, glm4_moe, glm4_moe_lite<br />glm_moe_dsa |
| MiniMax | minimax_m2 |
| Kimi | kimi_k2, kimi_k25 |
Expand Down
2 changes: 1 addition & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ uv pip install -e . --torch-backend=auto
| 系列 | model_type |
| -------- | ------------------------------------------------------------ |
| Qwen | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next |
| DeepSeek | deepseek_v3, deepseek_v32 |
| DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4 |
| GLM | glm4, glm4_moe, glm4_moe_lite<br />glm_moe_dsa |
| MiniMax | minimax_m2 |
| Kimi | kimi_k2, kimi_k25 |
Expand Down
Empty file.
20 changes: 13 additions & 7 deletions src/mcore_bridge/model/modules/transformer_layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,13 +191,19 @@ def can_recompute_pre_mlp_layernorm_for_cudagraph():
if 'mlp' in self.config.recompute_modules:
if not self.is_moe_layer:
self.recompute_mlp = True
if hasattr(self.config, 'fine_grained_activation_offloading'):
self.offload_attn_norm = (
self.config.fine_grained_activation_offloading and 'attn_norm' in self.config.offload_modules
and not isinstance(self.input_layernorm, IdentityOp))
self.offload_mlp_norm = (
self.config.fine_grained_activation_offloading and 'mlp_norm' in self.config.offload_modules
and not isinstance(self.pre_mlp_layernorm, IdentityOp))
if hasattr(self, '_set_offload_modules'):
from megatron.core.transformer.transformer_layer import _get_offloading_interface
self._set_offload_modules()
self.off_interface = _get_offloading_interface()
self.mlp_norm_manager = None
Comment on lines +196 to +200
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The initialization of offloading managers for Megatron-Core 0.17+ is incomplete. Setting self.mlp_norm_manager = None without assigning a manager from self.off_interface effectively disables offloading for the MLP layer normalization, even when it is configured in offload_modules. Additionally, self.attn_norm_manager should also be initialized to avoid potential AttributeError in base class methods that expect it to be present in newer versions of Megatron-Core.

Also, the local import of _get_offloading_interface inside __init__ is inefficient as it executes for every layer instantiation; consider moving it to the top of the file if possible.

        if hasattr(self, '_set_offload_modules'):
            from megatron.core.transformer.transformer_layer import _get_offloading_interface
            self._set_offload_modules()
            self.off_interface = _get_offloading_interface()
            offload_modules = getattr(self.config, 'offload_modules', []) or []
            is_offloading = getattr(self.config, 'fine_grained_activation_offloading', False)
            self.attn_norm_manager = self.off_interface.get_manager('attn_norm') if is_offloading and 'attn_norm' in offload_modules else None
            self.mlp_norm_manager = self.off_interface.get_manager('mlp_norm') if is_offloading and 'mlp_norm' in offload_modules else None

else:
if hasattr(self.config, 'fine_grained_activation_offloading'):
self.offload_attn_norm = (
self.config.fine_grained_activation_offloading and 'attn_norm' in self.config.offload_modules
and not isinstance(self.input_layernorm, IdentityOp))
self.offload_mlp_norm = (
self.config.fine_grained_activation_offloading and 'mlp_norm' in self.config.offload_modules
and not isinstance(self.pre_mlp_layernorm, IdentityOp))

# @jcasper how should we handle nvfuser?
# Set bias+dropout+add fusion grad_enable execution handler.
Expand Down
Loading