Skip to content

[model] Support deepseek-v4 #86

Merged
Jintao-Huang merged 24 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_
May 25, 2026
Merged

[model] Support deepseek-v4 #86
Jintao-Huang merged 24 commits into
modelscope:mainfrom
Jintao-Huang:support_deepseek_v4_

Conversation

@Jintao-Huang
Copy link
Copy Markdown
Collaborator

@Jintao-Huang Jintao-Huang commented May 20, 2026

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for deepseek_v4 support and updates the transformer layer to integrate with newer Megatron-Core offloading interfaces. Feedback indicates that the offloading implementation is incomplete, as it fails to properly initialize the normalization managers, and the local import should be moved for efficiency. Additionally, the implementation for the deepseek_v4 model appears to be missing from the current changes.

Comment on lines +194 to +198
if hasattr(self, '_set_offload_modules'):
from megatron.core.transformer.transformer_layer import _get_offloading_interface
self._set_offload_modules()
self.off_interface = _get_offloading_interface()
self.mlp_norm_manager = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The initialization of offloading managers for Megatron-Core 0.17+ is incomplete. Setting self.mlp_norm_manager = None without assigning a manager from self.off_interface effectively disables offloading for the MLP layer normalization, even when it is configured in offload_modules. Additionally, self.attn_norm_manager should also be initialized to avoid potential AttributeError in base class methods that expect it to be present in newer versions of Megatron-Core.

Also, the local import of _get_offloading_interface inside __init__ is inefficient as it executes for every layer instantiation; consider moving it to the top of the file if possible.

        if hasattr(self, '_set_offload_modules'):
            from megatron.core.transformer.transformer_layer import _get_offloading_interface
            self._set_offload_modules()
            self.off_interface = _get_offloading_interface()
            offload_modules = getattr(self.config, 'offload_modules', []) or []
            is_offloading = getattr(self.config, 'fine_grained_activation_offloading', False)
            self.attn_norm_manager = self.off_interface.get_manager('attn_norm') if is_offloading and 'attn_norm' in offload_modules else None
            self.mlp_norm_manager = self.off_interface.get_manager('mlp_norm') if is_offloading and 'mlp_norm' in offload_modules else None

Comment thread README.md
| -------- | ------------------------------------------------------------ |
| Qwen | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next |
| DeepSeek | deepseek_v3, deepseek_v32 |
| DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4 |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR adds deepseek_v4 to the list of supported models, but the actual implementation appears to be missing. The file src/mcore_bridge/model/gpts/deepseek_v4.py is empty in the provided context, and there are no changes to model registration or configuration logic to support this new model type. Please ensure the implementation is included or clarify if it relies on an existing model type.

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

截屏2026-05-24 15 46 33

@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the DeepSeek-V4 model, implementing its hybrid attention mechanism, hyper-connections, and specific configuration mappings. Key changes include updates to the state dict conversion logic, model configuration parsing, and the transformer layer's RoPE application. The reviewer identified several critical issues, including a regression in backward compatibility for existing models due to modified default argument values and a potential break in RoPE functionality when hyper-connections are enabled. Feedback also highlighted risks of runtime errors during configuration parsing and tensor concatenation, alongside a recommendation to use deep copies for module instances to prevent shared state issues.

Comment thread src/mcore_bridge/model/modules/transformer_layer.py Outdated
Comment thread src/mcore_bridge/model/register.py
Comment thread src/mcore_bridge/config/parser.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated
Comment thread src/mcore_bridge/model/gpts/deepseek_v4.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the DeepSeek-V4 model, incorporating new configuration parameters, state dict conversion logic for hyper-connections and hash layers, and a specialized hybrid attention implementation. It also refactors RoPE application to support inverse operations and interleaving removal. Feedback focused on a critical bug where shallow copying RotaryEmbedding objects causes incorrect positional embeddings due to shared caches, and a logic error in model registration that disables necessary RoPE patching when hyper-connections are enabled. Additionally, improvements were suggested to prevent potential runtime errors during state dict conversion by handling missing attributes and null tensors.

Comment thread src/mcore_bridge/model/gpts/deepseek_v4.py
Comment thread src/mcore_bridge/model/register.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py
Comment thread src/mcore_bridge/bridge/gpt_bridge.py
@Jintao-Huang
Copy link
Copy Markdown
Collaborator Author

TODO: MTP; FP4/FP8 load/save; shell

@Jintao-Huang Jintao-Huang merged commit 35e9b51 into modelscope:main May 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants