-
Notifications
You must be signed in to change notification settings - Fork 88
feat(transformers): add smollm3 (v4.54.1) #1391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ils device setting bug
feat(transformers): support qwen3-vl series
fix(transformers): fix typos in qwen3_vl docs
…e_search/group_beam_search/constrainted beam search
Summary of ChangesHello @Fzilan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces comprehensive support for the Qwen3-VL multimodal vision-language model series within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Qwen3-VL model and includes a massive synchronization with the upstream transformers library (v4.54.1). The changes are extensive, touching many parts of the codebase, including model implementations, utility functions, and examples. The refactoring efforts to adopt newer APIs (like mint and Cache objects) and Python 3.9+ typing are commendable and improve code quality. I've identified a few issues, mainly related to typos in documentation and a potentially fragile implementation detail in the ZeRO parallelism logic. My main concern is the change in mindone/trainers/zero.py, which uses a parameter name check that could be brittle. Overall, this is a significant and valuable update.
| if net.trainable_params(): | ||
| if "gate_up_proj" in net.trainable_params()[0].name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition if "gate_up_proj" in net.trainable_params()[0].name: is quite fragile as it depends on the name of the first trainable parameter. This could easily break if the model architecture or parameter order changes. A more robust approach would be to check for the attribute directly on the network object, for instance, using hasattr(net, 'gate_up_proj').
| # if there are no pad tokens present, then add eos to the end | ||
| modified_input_ids[i] = mint.nn.functional.pad(each_input_id, (0, 1), value=eos_token_id) | ||
| # if there are no pad tokens present, then add eos to the end | ||
| modified_input_ids[i] = mint.nn.functional.pad(each_input_id, (0, 1), value=eos_token_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for handling padding tokens when adding an EOS token has been removed. The new implementation unconditionally pads the eos_token_id at the end of the sequence. If input_ids can contain padding tokens, this change might lead to incorrect placement of the EOS token (i.e., after padding). Please confirm if input_ids passed to this function are guaranteed not to have padding tokens. If they can contain padding, the original logic to insert the EOS token before the first padding token should be restored to prevent this potential issue.
| an endangered wild feline species native to Central Aisa. | ||
| ... | ||
| **Appearance:** It has a stocky and robust build with short legs | ||
| and a large head relative to its body size. Its fur is thick and dense, | ||
| appearing somewhat fluffy or "matted,", which is characteristic'] | ||
| ``` | ||
|
|
||
| Qwen3-VL-30B Outputs: | ||
| ``` | ||
| ['Of course, here is detailed description of the image provided.\n\n | ||
| This is a dynamic and charming photograph of a Palla's cat (also known as a manul) in a snowy enviroment. | ||
| ... | ||
| "Appearance:" The cat has a very distinctive apperance, characterized by its stocky, low-slung body and exceptionally | ||
| thick, dense fur. This coat is a mix of brownish"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on #1387 (transfomrers 4.54.1 base pr)
Add the smollm3 model
Before submitting
What's New. Here are thedocumentation guidelines