Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds initial (WIP) support for Qwen3.5 multimodal models across LMDeploy’s VL adapter layer and PyTorch engine, including module mappings and configuration building, plus documentation updates.
Changes:
- Introduces new Qwen3.5 PyTorch model implementations (dense + MoE) and registers them in the model module map.
- Adds a VL model wrapper for Qwen3.5 and registers it in the VL model builder.
- Adds a Qwen3.5 config builder and updates docs/README to list Qwen3.5 as supported.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| lmdeploy/vl/model/qwen3_5.py | Adds VL-side Qwen3.5 wrapper/registration and preprocessor setup. |
| lmdeploy/vl/model/builder.py | Ensures Qwen3.5 VL wrapper is imported/registered. |
| lmdeploy/pytorch/nn/norm.py | Updates type annotations to use ` |
| lmdeploy/pytorch/models/qwen3_5.py | Adds main PyTorch implementation for Qwen3.5 VLM (vision + text + generation utilities). |
| lmdeploy/pytorch/models/qwen3_5_moe.py | Adds MoE variant wiring and expert weight-loading logic. |
| lmdeploy/pytorch/models/module_map.py | Registers HF architecture names to LMDeploy Qwen3.5 model entrypoints. |
| lmdeploy/pytorch/configurations/qwen3_5.py | Adds config builder for Qwen3.5(+MoE), including state shapes for linear-attn layers. |
| docs/zh_cn/supported_models/supported_models.md | Adds Qwen3.5 row to supported models table. |
| docs/en/supported_models/supported_models.md | Adds Qwen3.5 row to supported models table. |
| README.md / README_zh-CN.md / README_ja.md | Lists Qwen3.5 among supported models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @VISION_MODELS.register_module() | ||
| class Qwen3_5Model(Qwen3VLModel): | ||
| """Qwen3_5 model.""" | ||
|
|
||
| _arch = ['Qwen3_5ForConditionalGeneration', 'Qwen3_5MoeForConditionalGeneration'] | ||
|
|
||
| def build_preprocessor(self): | ||
| check_transformers() | ||
| self.processor = AutoProcessor.from_pretrained(self.model_path) | ||
| tokenizer = self.processor.tokenizer | ||
| self.image_token = self.processor.image_token | ||
| self.image_token_id = tokenizer.encode(self.image_token)[-1] | ||
| self.mm_processor_kwargs = None |
There was a problem hiding this comment.
New Qwen3.5 VL preprocessor/model registration is added, but there are existing processor tests for Qwen3-VL (tests/test_lmdeploy/test_vl/test_qwen3vl_processor.py) and none for Qwen3.5. Please add an analogous unit test to validate build_preprocessor() + preprocess() behavior (including mm_processor_kwargs min/max pixel handling) so regressions are caught early.
|
|
||
| conv_state_shape = (num_delta_layers, conv_dim, conv_kernel_size) | ||
| recurrent_state_shape = (num_delta_layers, num_v_heads, head_k_dim, head_v_dim) | ||
| dtype = torch.bfloat16 |
There was a problem hiding this comment.
Why not use dtype from hf_config?
There was a problem hiding this comment.
causal_conv1d and fla only support bfloat16
|
Can we put |
|
The |
| # do sigmoid and float here to prevent contiguous kernel | ||
| b = b.sigmoid().flatten(-2, -1) | ||
| a = a.float().flatten(-2, -1) |
There was a problem hiding this comment.
I'm not entirely clear on the reasoning behind b.sigmoid and a.float here. Previously, these operations were performed in forward. Are you suggesting that the previous approach affected memory contiguity?
There was a problem hiding this comment.
a and b are chunks of in_proj_ba(hidden_states), which is not contiguous.
flatten a,b requires another transpose kernel. Elementwise op would automatically contiguous the output so we can skip the flatten kernel.
Confirmed that |
|
Text message works well, message with images fails: Likely due to chat_template issues. The logic fails here, https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py#L704 Consequently, qwen3.5 falls back to use BastChatTemplate rather than HFChatTemplate. |
I am working on causal-conv1d tilelang kernel. |
@CUHKSZzxy I've fixed this issue. May review again. |
| @lru_cache | ||
| def has_tilelang(): | ||
| try: | ||
| import tilelang # noqa: F401 |
There was a problem hiding this comment.
Should we include tilelang in requirements.txt?
| def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]): | ||
| """Load weights.""" | ||
|
|
||
| def __skip_layers(name): |
There was a problem hiding this comment.
Should we keep this, or only for debugging purposes?
| <li>Qwen2-VL (2B, 7B, 72B)</li> | ||
| <li>Qwen2.5-VL (3B, 7B, 72B)</li> | ||
| <li>Qwen3-VL (2B - 235B)</li> | ||
| <li>Qwen3.5</li> |
There was a problem hiding this comment.
May add model size here Qwen3.5 (27B - 397B) and also in the other README files.
| torch.testing.assert_close(conv_state_clone, conv_state, rtol=1e-3, atol=1e-3) | ||
|
|
||
|
|
||
| class TestCausalConv1dFn: |
There was a problem hiding this comment.
may need to skip test case if causal_conv1d not installed
| add_generation_prompt=add_generation_prompt, | ||
| **kwargs) | ||
| # Remove the sentinel part. | ||
| prompt = prompt[len(self.sentinel_system_prompt):] if len(self.sentinel_system_prompt) > 0 else prompt |
There was a problem hiding this comment.
ut fails due to self.sentinel_system_prompt is None
|
Evaluation test failed on commit id 6124d75 |
Support Qwen3.5-397B-A17B