[model] Support deepseek-v4 by Jintao-Huang · Pull Request #86 · modelscope/mcore-bridge

Jintao-Huang · 2026-05-20T05:48:01Z

huggingface/transformers#45643
NVIDIA/Megatron-LM#4458
modelscope/ms-swift#9386

gemini-code-assist

Code Review

This pull request adds documentation for deepseek_v4 support and updates the transformer layer to integrate with newer Megatron-Core offloading interfaces. Feedback indicates that the offloading implementation is incomplete, as it fails to properly initialize the normalization managers, and the local import should be moved for efficiency. Additionally, the implementation for the deepseek_v4 model appears to be missing from the current changes.

gemini-code-assist · 2026-05-20T05:50:51Z

+        if hasattr(self, '_set_offload_modules'):
+            from megatron.core.transformer.transformer_layer import _get_offloading_interface
+            self._set_offload_modules()
+            self.off_interface = _get_offloading_interface()
+            self.mlp_norm_manager = None


The initialization of offloading managers for Megatron-Core 0.17+ is incomplete. Setting self.mlp_norm_manager = None without assigning a manager from self.off_interface effectively disables offloading for the MLP layer normalization, even when it is configured in offload_modules. Additionally, self.attn_norm_manager should also be initialized to avoid potential AttributeError in base class methods that expect it to be present in newer versions of Megatron-Core.

Also, the local import of _get_offloading_interface inside __init__ is inefficient as it executes for every layer instantiation; consider moving it to the top of the file if possible.

if hasattr(self, '_set_offload_modules'): from megatron.core.transformer.transformer_layer import _get_offloading_interface self._set_offload_modules() self.off_interface = _get_offloading_interface() offload_modules = getattr(self.config, 'offload_modules', []) or [] is_offloading = getattr(self.config, 'fine_grained_activation_offloading', False) self.attn_norm_manager = self.off_interface.get_manager('attn_norm') if is_offloading and 'attn_norm' in offload_modules else None self.mlp_norm_manager = self.off_interface.get_manager('mlp_norm') if is_offloading and 'mlp_norm' in offload_modules else None

gemini-code-assist · 2026-05-20T05:50:51Z

 | -------- | ------------------------------------------------------------ |
 | Qwen     | qwen2, qwen2_moe<br />qwen3, qwen3_moe, qwen3_next |
-| DeepSeek | deepseek_v3, deepseek_v32                                    |
+| DeepSeek | deepseek_v3, deepseek_v32, deepseek_v4                        |


The PR adds deepseek_v4 to the list of supported models, but the actual implementation appears to be missing. The file src/mcore_bridge/model/gpts/deepseek_v4.py is empty in the provided context, and there are no changes to model registration or configuration logic to support this new model type. Please ensure the implementation is included or clarify if it relies on an existing model type.

Jintao-Huang · 2026-05-24T07:46:44Z

Jintao-Huang · 2026-05-24T07:47:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the DeepSeek-V4 model, implementing its hybrid attention mechanism, hyper-connections, and specific configuration mappings. Key changes include updates to the state dict conversion logic, model configuration parsing, and the transformer layer's RoPE application. The reviewer identified several critical issues, including a regression in backward compatibility for existing models due to modified default argument values and a potential break in RoPE functionality when hyper-connections are enabled. Feedback also highlighted risks of runtime errors during configuration parsing and tensor concatenation, alongside a recommendation to use deep copies for module instances to prevent shared state issues.

Jintao-Huang · 2026-05-25T08:25:01Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for the DeepSeek-V4 model, incorporating new configuration parameters, state dict conversion logic for hyper-connections and hash layers, and a specialized hybrid attention implementation. It also refactors RoPE application to support inverse operations and interleaving removal. Feedback focused on a critical bug where shallow copying RotaryEmbedding objects causes incorrect positional embeddings due to shared caches, and a logic error in model registration that disables necessary RoPE patching when hyper-connections are enabled. Additionally, improvements were suggested to prevent potential runtime errors during state dict conversion by handling missing attributes and null tensors.

Jintao-Huang · 2026-05-25T08:31:45Z

TODO: MTP; FP4/FP8 load/save; shell

…_' into support_deepseek_v4_

Jintao-Huang added 2 commits May 20, 2026 11:47

support deepseek_v4

8403e3c

update

358e51f

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Jintao-Huang added 2 commits May 20, 2026 15:09

update

9ea9b44

update

8b6898b

Jintao-Huang mentioned this pull request May 20, 2026

[megatron] Support deepseek-v4 megatron modelscope/ms-swift#9386

Open

Jintao-Huang added 11 commits May 20, 2026 17:01

update

8816e00

Merge branch 'main' into support_deepseek_v4_

a89d118

update

95d5562

Merge branch 'main' into support_deepseek_v4_

e8db566

fix

f4e96b8

update

517c2f3

update

00e2adc

Merge branch 'main' into support_deepseek_v4_

971d8cf

update

872c1e3

update

0b0291c

lint pass

15cce39

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/modules/transformer_layer.py Outdated

Comment thread src/mcore_bridge/model/register.py

Comment thread src/mcore_bridge/config/parser.py

Comment thread src/mcore_bridge/bridge/gpt_bridge.py Outdated

Comment thread src/mcore_bridge/model/gpts/deepseek_v4.py

hjh0119 approved these changes May 24, 2026

View reviewed changes

Jintao-Huang added 6 commits May 24, 2026 17:03

fix

d0f556f

update

a5d4593

Merge branch 'main' into support_deepseek_v4_

45b2073

update

9b5c1d0

fix

5323467

fix

0788ddb

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Comment thread src/mcore_bridge/model/gpts/deepseek_v4.py

Comment thread src/mcore_bridge/model/register.py

Comment thread src/mcore_bridge/bridge/gpt_bridge.py

Comment thread src/mcore_bridge/bridge/gpt_bridge.py

Jintao-Huang added 3 commits May 25, 2026 16:38

update

4c4b985

update

140566d

Merge remote-tracking branch 'refs/remotes/origin/support_deepseek_v4…

eedaa98

…_' into support_deepseek_v4_

Jintao-Huang merged commit 35e9b51 into modelscope:main May 25, 2026
1 check passed

Conversation

Jintao-Huang commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented May 24, 2026

Uh oh!

Jintao-Huang commented May 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jintao-Huang commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jintao-Huang commented May 20, 2026 •

edited

Loading