Skip to content

Releases: modelscope/mcore-bridge

v1.4.0

17 May 15:50

Choose a tag to compare

中文版

新特性

  1. 新增 model_type 支持:bailing_moeqwen3_asr
  2. 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
  3. transformer_block / transformer_layer 进行重构,通过可继承的方式便于新模型的接入。
  4. 兼容 Python 3.13。
  5. 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
  6. 新增 padding_mask 支持,修复了在 padding_free=False 时,moe_aux_loss 对 padding token 错误计算 routing loss 的问题。

English Version

New Features

  1. Added model_type support for bailing_moe and qwen3_asr.
  2. Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
  3. Refactored transformer_block / transformer_layer with an inheritable design to simplify the integration of new models.
  4. Added compatibility with Python 3.13.
  5. Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
  6. Added padding_mask support, fixing an issue where moe_aux_loss incorrectly computed routing loss on padding tokens when padding_free=False.

What's Changed

Full Changelog: v1.3.0...v1.4.0

Patch release v1.3.2

12 May 14:41

Choose a tag to compare

Patch release v1.3.1

10 May 05:29

Choose a tag to compare

v1.3.0

07 May 02:51

Choose a tag to compare

中文版

新特性

  1. 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。
  2. mlp_padding_free 兼容 Sequence Parallelism。
  3. 移除对 megatron-core 0.12 - 0.14 版本的依赖支持。

English Version

New Features

  1. Added model_type support: kimi_k25, hy_v3, llava_onevision.
  2. mlp_padding_free is now compatible with Sequence Parallelism.
  3. Removed dependency support for megatron-core versions 0.12 - 0.14.

What's Changed

New Contributors

Full Changelog: v1.2.0...v1.3.0

Patch release v1.2.3

05 May 13:51

Choose a tag to compare

Patch release v1.2.2

04 May 09:52

Choose a tag to compare

Patch release v1.2.1

25 Apr 06:46

Choose a tag to compare

v1.2.0

23 Apr 07:20

Choose a tag to compare

中文版

新特性

  1. 支持 GLM-5 共享参数 MTP ,可通过mtp_shared_weights参数启用。
  2. 支持 Qwen3.5 FP8 训练和权重导入导出。
  3. 支持控制 MTP 分支中 decoder_input 是否停止梯度,即 MTP loss 能否直接通过 decoder_input 回传梯度到 Embedding/ViT,使用mtp_decoder_input_detach参数。
  4. 昇腾 NPU 训练兼容 megatron-core 0.15.3。

English Version

New Features

  1. Added support for GLM-5 shared-weight MTP, which can be enabled via the mtp_shared_weights argument.
  2. Added support for Qwen3.5 FP8 training and FP8 weight import/export.
  3. Added support for controlling whether gradients are stopped at decoder_input in the MTP branch, i.e., whether the MTP loss can be back-propagated through decoder_input to Embedding/ViT. This can be configured via the mtp_decoder_input_detach argument.
  4. Added compatibility with Megatron-Core 0.15.3 for training on Huawei Ascend NPU.

What's Changed

New Contributors

Full Changelog: v1.1.0...v1.2.0

Patch release v1.1.2

18 Apr 14:38

Choose a tag to compare

Patch release v1.1.1

12 Apr 12:04

Choose a tag to compare