feat(transformers): add smollm3 (v4.54.1) #1391

Fzilan · 2025-10-25T04:03:03Z

based on #1387 (transfomrers 4.54.1 base pr)
Add the smollm3 model

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

…ils device setting bug

feat(transformers): support qwen3-vl series

fix(transformers): fix typos in qwen3_vl docs

…e_search/group_beam_search/constrainted beam search

gemini-code-assist · 2025-10-25T04:06:43Z

Summary of Changes

Hello @Fzilan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive support for the Qwen3-VL multimodal vision-language model series within the mindone.transformers library. It includes the integration of both dense and Mixture-of-Experts (MoE) model architectures, accompanied by dedicated processing utilities for images and videos. Key architectural advancements like DeepStack integration and specialized parallelization for MoE layers are incorporated. The changes also involve substantial refactoring and updates to core library components, such as activation functions, image processing, and attention masking, to ensure robust functionality and future compatibility.

Highlights

New Qwen3-VL Model Integration: Added comprehensive support for the Qwen3-VL multimodal vision-language model series, encompassing both dense and Mixture-of-Experts (MoE) variants. This includes dedicated model architectures and processing utilities.
Multimodal Processing Enhancements: Implemented advanced processing for image and video inputs, featuring special token handling, DeepStack integration for leveraging multi-level Vision Transformer features, and improved temporal grounding for video understanding.
Parallelization for MoE Models: Introduced MoeTextExperts and updated mindone.trainers.zero to enable efficient Zero-3 parallelization, specifically tailored for Mixture-of-Experts layers within the Qwen3-VL architecture.
Core Library Refinements: Performed significant updates across core mindone.transformers components, including activation functions, image processing utilities, attention masking, and generation mechanisms, to enhance MindSpore compatibility and overall flexibility.
Updated Transformers Version: The internal mindone.transformers version has been updated to 4.54.1, reflecting broader compatibility and feature updates across the library.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Qwen3-VL model and includes a massive synchronization with the upstream transformers library (v4.54.1). The changes are extensive, touching many parts of the codebase, including model implementations, utility functions, and examples. The refactoring efforts to adopt newer APIs (like mint and Cache objects) and Python 3.9+ typing are commendable and improve code quality. I've identified a few issues, mainly related to typos in documentation and a potentially fragile implementation detail in the ZeRO parallelism logic. My main concern is the change in mindone/trainers/zero.py, which uses a parameter name check that could be brittle. Overall, this is a significant and valuable update.

gemini-code-assist · 2025-10-25T04:09:30Z

mindone/trainers/zero.py

+        if net.trainable_params():
+            if "gate_up_proj" in net.trainable_params()[0].name:


The condition if "gate_up_proj" in net.trainable_params()[0].name: is quite fragile as it depends on the name of the first trainable parameter. This could easily break if the model architecture or parameter order changes. A more robust approach would be to check for the attribute directly on the network object, for instance, using hasattr(net, 'gate_up_proj').

gemini-code-assist · 2025-10-25T04:09:30Z

mindone/transformers/models/clvp/modeling_clvp.py

-                # if there are no pad tokens present, then add eos to the end
-                modified_input_ids[i] = mint.nn.functional.pad(each_input_id, (0, 1), value=eos_token_id)
+            # if there are no pad tokens present, then add eos to the end
+            modified_input_ids[i] = mint.nn.functional.pad(each_input_id, (0, 1), value=eos_token_id)


The logic for handling padding tokens when adding an EOS token has been removed. The new implementation unconditionally pads the eos_token_id at the end of the sequence. If input_ids can contain padding tokens, this change might lead to incorrect placement of the EOS token (i.e., after padding). Please confirm if input_ids passed to this function are guaranteed not to have padding tokens. If they can contain padding, the original logic to insert the EOS token before the first padding token should be restored to prevent this potential issue.

gemini-code-assist · 2025-10-25T04:09:31Z

examples/transformers/qwen3_vl/README.md

+an endangered wild feline species native to Central Aisa.
+...
+**Appearance:** It has a stocky and robust build with short legs
+and a large head relative to its body size. Its fur is thick and dense,
+appearing somewhat fluffy or "matted,", which is characteristic']
+```
+
+Qwen3-VL-30B Outputs:
+```
+['Of course, here is detailed description of the image provided.\n\n
+This is a dynamic and charming photograph of a Palla's cat (also known as a manul) in a snowy enviroment.
+...
+"Appearance:" The cat has a very distinctive apperance, characterized by its stocky, low-slung body and exceptionally
+thick, dense fur. This coat is a mix of brownish"]


I've noticed a few typos in the model output examples. Correcting them will improve the documentation's clarity and professionalism.

Line 61: Aisa should be Asia.

Line 71: Palla's should be Pallas's and enviroment should be environment.

Line 73: apperance should be appearance.

wcrzlh and others added 30 commits August 18, 2025 15:22

upgrade activation_func to transformers v4.54

a8c5c54

feat(transformers): upgrade attn_mask/rope to 4.54

c05a707

feat(transformers): upgrade modeling_layers to 4.54

84b3ece

feat(transformers): upgrade cache_utils to 4.54

6b94c2d

feat(transformers): upgrade modeling_utils to v4.54

2f70121

feat(transformers): upgrade generation/utils to v4.54

44ad424

feat(transformers): add ernie4.5 for validation

17da5c7

fix get_type_hints problem

fd769b1

fix get_type_hints problem

37fe594

fix get_type_hints problem

0874e77

fix metadata.get keyerror

94fb78b

fix masking_utils alignment

bf69ef9

fix generation/utils logic

a1be89c

fix get_output_embedding override bug

b3334ac

fix __init_subclass__ bug

833419c

suplement checkpoint_conversion_mapping

c532a9a

feat(transformers): upgrade beam search to v4.54

1ac2f72

feat(transformers): upgrade candidate_generator to v4.54

375e6ab

feat(transformers): upgrade logits_process/stopping_criteria to v4.54

25033c1

pre-commit

252f4aa

pre-commit

02834b0

update backbone_utils

65e8256

update generic

913cd3c

remove add_model_info_to_auto_map & update feature_extraction_utils.py

a0c9dc9

remove add_model_info_to_auto_map & update image_processing_base.py

2a517d8

remove add_model_info_to_auto_map & update processing_utils.py

8d29722

remove add_model_info_to_auto_map & update video_utils.py

8a90ca6

tokenization_utils.py update

dcb98ac

add_model_info_to_custom_pipelines

5120977

update tokenization_utils_base.py

9a18655

wcrzlh and others added 27 commits September 11, 2025 14:39

fix modeling_utils/from_config mindspore_dtype setting, generation/ut…

cdebac0

…ils device setting bug

feat(transformers): add qwen3_vl/qwen3_vl_moe model

7a20fe1

fix moe precision bug

4079e6f

fix qwen3_vl moe memory bugs

c1cde3a

supplement zero3 model weight shard for moe part

721d0a3

fix qwen3_vl_moe precision bug

e43f3dd

fix qwen3_vl_moe precision bug

51515b9

fix moe part shard bug

25c8110

pre-commit

9650f4f

reformat

3771434

Merge pull request mindspore-lab#1310 from wcrzlh/qwen3_vl

2d5f9e7

feat(transformers): support qwen3-vl series

fix(transformers): fix typos in qwen3_vl docs

fed7ffc

Merge pull request mindspore-lab#1311 from wcrzlh/qwen3_vl

f2b56bf

fix(transformers): fix typos in qwen3_vl docs

feat(transformers): add processor for qwen3_vl (mindspore-lab#1326)

3c81df8

fix(transformers): supplement condition of taking model as processor

6e6361a

fix(transformers): reformat generation/utils

e72e032

fix(transformers): supplement candidate generator

dcdec6c

fix(transformers): supplement logits processor

47ca032

feat(transformers): add assisted_generation/dola_generation/contrasiv…

c6df7fd

…e_search/group_beam_search/constrainted beam search

rebase

11ba44c

reformat

18d35f6

fix import bug

8b75291

fix ut bug

9fd221f

update pyproject.toml

46639e0

pre-commit

aa0e7b0

add smollm3

e29c206

fix import

3abca2b

Fzilan requested a review from vigo999 as a code owner October 25, 2025 04:03

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(transformers): add smollm3 (v4.54.1) #1391

feat(transformers): add smollm3 (v4.54.1) #1391

Uh oh!

Fzilan commented Oct 25, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if net.trainable_params():
		if "gate_up_proj" in net.trainable_params()[0].name:

feat(transformers): add smollm3 (v4.54.1) #1391

Are you sure you want to change the base?

feat(transformers): add smollm3 (v4.54.1) #1391

Uh oh!

Conversation

Fzilan commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

Uh oh!

gemini-code-assist bot commented Oct 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fzilan commented Oct 25, 2025 •

edited

Loading