Qwen3.5 by grimoire · Pull Request #4351 · InternLM/lmdeploy

grimoire · 2026-02-11T10:40:53Z

Support Qwen3.5-397B-A17B

Copilot

Pull request overview

This PR adds initial (WIP) support for Qwen3.5 multimodal models across LMDeploy’s VL adapter layer and PyTorch engine, including module mappings and configuration building, plus documentation updates.

Changes:

Introduces new Qwen3.5 PyTorch model implementations (dense + MoE) and registers them in the model module map.
Adds a VL model wrapper for Qwen3.5 and registers it in the VL model builder.
Adds a Qwen3.5 config builder and updates docs/README to list Qwen3.5 as supported.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
lmdeploy/vl/model/qwen3_5.py	Adds VL-side Qwen3.5 wrapper/registration and preprocessor setup.
lmdeploy/vl/model/builder.py	Ensures Qwen3.5 VL wrapper is imported/registered.
lmdeploy/pytorch/nn/norm.py	Updates type annotations to use `
lmdeploy/pytorch/models/qwen3_5.py	Adds main PyTorch implementation for Qwen3.5 VLM (vision + text + generation utilities).
lmdeploy/pytorch/models/qwen3_5_moe.py	Adds MoE variant wiring and expert weight-loading logic.
lmdeploy/pytorch/models/module_map.py	Registers HF architecture names to LMDeploy Qwen3.5 model entrypoints.
lmdeploy/pytorch/configurations/qwen3_5.py	Adds config builder for Qwen3.5(+MoE), including state shapes for linear-attn layers.
docs/zh_cn/supported_models/supported_models.md	Adds Qwen3.5 row to supported models table.
docs/en/supported_models/supported_models.md	Adds Qwen3.5 row to supported models table.
README.md / README_zh-CN.md / README_ja.md	Lists Qwen3.5 among supported models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T10:48:44Z

lmdeploy/vl/model/qwen3_5.py

+@VISION_MODELS.register_module()
+class Qwen3_5Model(Qwen3VLModel):
+    """Qwen3_5 model."""
+
+    _arch = ['Qwen3_5ForConditionalGeneration', 'Qwen3_5MoeForConditionalGeneration']
+
+    def build_preprocessor(self):
+        check_transformers()
+        self.processor = AutoProcessor.from_pretrained(self.model_path)
+        tokenizer = self.processor.tokenizer
+        self.image_token = self.processor.image_token
+        self.image_token_id = tokenizer.encode(self.image_token)[-1]
+        self.mm_processor_kwargs = None


New Qwen3.5 VL preprocessor/model registration is added, but there are existing processor tests for Qwen3-VL (tests/test_lmdeploy/test_vl/test_qwen3vl_processor.py) and none for Qwen3.5. Please add an analogous unit test to validate build_preprocessor() + preprocess() behavior (including mm_processor_kwargs min/max pixel handling) so regressions are caught early.

docs/en/supported_models/supported_models.md

docs/zh_cn/supported_models/supported_models.md

lmdeploy/pytorch/models/qwen3_5.py

lmdeploy/pytorch/configurations/qwen3_5.py

lmdeploy/pytorch/models/qwen3_5.py

lmdeploy/pytorch/models/qwen3_5_moe.py

lmdeploy/vl/model/builder.py

lvhan028 · 2026-02-24T11:28:36Z

lmdeploy/pytorch/configurations/qwen3_5.py

+
+        conv_state_shape = (num_delta_layers, conv_dim, conv_kernel_size)
+        recurrent_state_shape = (num_delta_layers, num_v_heads, head_k_dim, head_v_dim)
+        dtype = torch.bfloat16


Why not use dtype from hf_config?

causal_conv1d and fla only support bfloat16

docs/en/supported_models/supported_models.md

lvhan028 · 2026-02-25T07:46:49Z

Can we put flash-linear-attention in requirements? If so, the fla installation hint can be removed from qwen3_next.py

lvhan028 · 2026-02-25T07:51:48Z

The causal-conv1d installation should be included in the Dockerfile, and the transformers constraints need to be removed. I'll handle this in a separate PR.

lvhan028 · 2026-02-25T08:26:21Z

lmdeploy/pytorch/models/qwen3_next.py

+        # do sigmoid and float here to prevent contiguous kernel
+        b = b.sigmoid().flatten(-2, -1)
+        a = a.float().flatten(-2, -1)


I'm not entirely clear on the reasoning behind b.sigmoid and a.float here. Previously, these operations were performed in forward. Are you suggesting that the previous approach affected memory contiguity?

a and b are chunks of in_proj_ba(hidden_states), which is not contiguous.
flatten a,b requires another transpose kernel. Elementwise op would automatically contiguous the output so we can skip the flatten kernel.

lvhan028 · 2026-02-25T09:05:48Z

Can we put flash-linear-attention in requirements? If so, the fla installation hint can be removed from qwen3_next.py

Confirmed that pip install flash-linear-attention works. Please add it to requirements.txt and remove the installation hint from the code

CUHKSZzxy · 2026-02-25T09:30:06Z

Text message works well, message with images fails:

  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 408, in <lambda>
    loop).add_done_callback(lambda f: None if f.cancelled() else f.result())
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 401, in _infer
    await asyncio.gather(*tasks)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 377, in _sync_resp
    async for out in g:
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/core/async_engine.py", line 334, in generate
    prompt_input = await self.prompt_processor.get_prompt_input(prompt=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 265, in get_prompt_input
    return await self._get_multimodal_prompt_input(messages=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 424, in _get_multimodal_prompt_input
    results = await self.vl_encoder.wrap_for_pytorch(results,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/engine.py", line 103, in wrap_for_pytorch
    result = self.model.to_pytorch(messages,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 125, in to_pytorch
    prompt, IMAGE_TOKEN = self.proc_messages(messages, chat_template, sequence_start, chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 114, in proc_messages
    prompt = chat_template.messages2prompt(prompt_messages, sequence_start, **chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 186, in messages2prompt
    content = get_text(message['content'])
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 31, in get_text
    return content[0]['text']
KeyError: 'text'

Likely due to chat_template issues. The logic fails here, https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py#L704

jinja2.exceptions.TemplateError: No user query found in messages.
> /nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py(705)__init__()
-> _, _, self.sentinel_system_messages, self.sentinel_system_prompt = self._role_instruction('system')

Consequently, qwen3.5 falls back to use BastChatTemplate rather than HFChatTemplate.

grimoire · 2026-02-25T12:04:58Z

The causal-conv1d installation should be included in the Dockerfile, and the transformers constraints need to be removed. I'll handle this in a separate PR.

I am working on causal-conv1d tilelang kernel.

lvhan028 · 2026-02-25T12:34:56Z

Text message works well, message with images fails:

  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 408, in <lambda>
    loop).add_done_callback(lambda f: None if f.cancelled() else f.result())
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 401, in _infer
    await asyncio.gather(*tasks)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 377, in _sync_resp
    async for out in g:
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/core/async_engine.py", line 334, in generate
    prompt_input = await self.prompt_processor.get_prompt_input(prompt=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 265, in get_prompt_input
    return await self._get_multimodal_prompt_input(messages=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 424, in _get_multimodal_prompt_input
    results = await self.vl_encoder.wrap_for_pytorch(results,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/engine.py", line 103, in wrap_for_pytorch
    result = self.model.to_pytorch(messages,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 125, in to_pytorch
    prompt, IMAGE_TOKEN = self.proc_messages(messages, chat_template, sequence_start, chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 114, in proc_messages
    prompt = chat_template.messages2prompt(prompt_messages, sequence_start, **chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 186, in messages2prompt
    content = get_text(message['content'])
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 31, in get_text
    return content[0]['text']
KeyError: 'text'

Likely due to chat_template issues. The logic fails here, https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py#L704

jinja2.exceptions.TemplateError: No user query found in messages.
> /nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py(705)__init__()
-> _, _, self.sentinel_system_messages, self.sentinel_system_prompt = self._role_instruction('system')

Consequently, qwen3.5 falls back to use BastChatTemplate rather than HFChatTemplate.

@CUHKSZzxy I've fixed this issue. May review again.

RunningLeon · 2026-02-26T03:19:01Z

lmdeploy/pytorch/backends/cuda/causal_conv1d.py

+@lru_cache
+def has_tilelang():
+    try:
+        import tilelang  # noqa: F401


Should we include tilelang in requirements.txt?

CUHKSZzxy · 2026-02-26T03:20:27Z

lmdeploy/pytorch/models/qwen3_5.py

+    def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
+        """Load weights."""
+
+        def __skip_layers(name):


Should we keep this, or only for debugging purposes?

CUHKSZzxy · 2026-02-26T03:23:12Z

README.md

  <li>Qwen2-VL (2B, 7B, 72B)</li>
  <li>Qwen2.5-VL (3B, 7B, 72B)</li>
  <li>Qwen3-VL (2B - 235B)</li>
+  <li>Qwen3.5</li>


May add model size here Qwen3.5 (27B - 397B) and also in the other README files.

RunningLeon · 2026-02-26T03:50:29Z

tests/pytorch/kernel/test_causal_conv1d.py

+        torch.testing.assert_close(conv_state_clone, conv_state, rtol=1e-3, atol=1e-3)
+
+
+class TestCausalConv1dFn:


may need to skip test case if causal_conv1d not installed

RunningLeon · 2026-02-26T04:02:55Z

lmdeploy/model.py

                                                        add_generation_prompt=add_generation_prompt,
                                                        **kwargs)
            # Remove the sentinel part.
            prompt = prompt[len(self.sentinel_system_prompt):] if len(self.sentinel_system_prompt) > 0 else prompt


ut fails due to self.sentinel_system_prompt is None

lvhan028 · 2026-02-26T04:03:00Z

Evaluation test failed on commit id 6124d75

| GPQA_diamond_repeat_4 | 772ea0 | accuracy (4 runs average) | gen | 52.78 |

grimoire added 4 commits February 11, 2026 15:01

wip

255a4c8

fix qwen3.5

4dee2e4

fix moe

44100c9

Merge branch 'main' into qwen3.5

7e1ec4e

Copilot AI review requested due to automatic review settings February 11, 2026 10:40

Copilot started reviewing on behalf of grimoire February 11, 2026 10:41 View session

grimoire marked this pull request as draft February 11, 2026 10:44

Copilot AI reviewed Feb 11, 2026

View reviewed changes

grimoire added 4 commits February 13, 2026 12:52

abstract gated delta, remove weight loader

28be511

fix qwen3.5 gateddelta qkv loader

e9c7412

fix multiround chat

a9e9e86

fix moe

a92da91

grimoire changed the title ~~[WIP] Qwen3.5~~ Qwen3.5 Feb 16, 2026

update docs

b3aeec0

grimoire marked this pull request as ready for review February 16, 2026 12:32

grimoire added 4 commits February 16, 2026 20:50

fix copilot

47116d1

update readme

cbed161

Merge branch 'main' into qwen3.5

8afb580

support 397b

03b4499

lvhan028 added the enhancement New feature or request label Feb 24, 2026

lvhan028 requested a review from CUHKSZzxy February 24, 2026 04:45

lvhan028 reviewed Feb 24, 2026

View reviewed changes

windreamer linked an issue Feb 25, 2026 that may be closed by this pull request

[Feature] 请求支持qwen3.5系列模型 #4368

Open

lvhan028 requested a review from RunningLeon February 25, 2026 07:11

RunningLeon reviewed Feb 25, 2026

View reviewed changes

docs/en/supported_models/supported_models.md Outdated Show resolved Hide resolved

lvhan028 reviewed Feb 25, 2026

View reviewed changes

optimize conv1d

b51a2c9

fix chat template

6124d75

add tilelang kernel

f763889

RunningLeon reviewed Feb 26, 2026

View reviewed changes

CUHKSZzxy reviewed Feb 26, 2026

View reviewed changes

RunningLeon reviewed Feb 26, 2026

View reviewed changes

grimoire added 3 commits February 26, 2026 12:43

avoid recompile

9a620be

fix

b5d5902

optimize decoding

5e9e3c4

		torch.testing.assert_close(conv_state_clone, conv_state, rtol=1e-3, atol=1e-3)


		class TestCausalConv1dFn:

Conversation

grimoire commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvhan028 commented Feb 25, 2026

Uh oh!

lvhan028 commented Feb 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Feb 25, 2026

Uh oh!

CUHKSZzxy commented Feb 25, 2026

Uh oh!

grimoire commented Feb 25, 2026

Uh oh!

lvhan028 commented Feb 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

grimoire commented Feb 11, 2026 •

edited

Loading