Skip to content

Qwen3.5#4351

Open
grimoire wants to merge 19 commits intoInternLM:mainfrom
grimoire:qwen3.5
Open

Qwen3.5#4351
grimoire wants to merge 19 commits intoInternLM:mainfrom
grimoire:qwen3.5

Conversation

@grimoire
Copy link
Collaborator

@grimoire grimoire commented Feb 11, 2026

Support Qwen3.5-397B-A17B

Copilot AI review requested due to automatic review settings February 11, 2026 10:40
@grimoire grimoire marked this pull request as draft February 11, 2026 10:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial (WIP) support for Qwen3.5 multimodal models across LMDeploy’s VL adapter layer and PyTorch engine, including module mappings and configuration building, plus documentation updates.

Changes:

  • Introduces new Qwen3.5 PyTorch model implementations (dense + MoE) and registers them in the model module map.
  • Adds a VL model wrapper for Qwen3.5 and registers it in the VL model builder.
  • Adds a Qwen3.5 config builder and updates docs/README to list Qwen3.5 as supported.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
lmdeploy/vl/model/qwen3_5.py Adds VL-side Qwen3.5 wrapper/registration and preprocessor setup.
lmdeploy/vl/model/builder.py Ensures Qwen3.5 VL wrapper is imported/registered.
lmdeploy/pytorch/nn/norm.py Updates type annotations to use `
lmdeploy/pytorch/models/qwen3_5.py Adds main PyTorch implementation for Qwen3.5 VLM (vision + text + generation utilities).
lmdeploy/pytorch/models/qwen3_5_moe.py Adds MoE variant wiring and expert weight-loading logic.
lmdeploy/pytorch/models/module_map.py Registers HF architecture names to LMDeploy Qwen3.5 model entrypoints.
lmdeploy/pytorch/configurations/qwen3_5.py Adds config builder for Qwen3.5(+MoE), including state shapes for linear-attn layers.
docs/zh_cn/supported_models/supported_models.md Adds Qwen3.5 row to supported models table.
docs/en/supported_models/supported_models.md Adds Qwen3.5 row to supported models table.
README.md / README_zh-CN.md / README_ja.md Lists Qwen3.5 among supported models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +32
@VISION_MODELS.register_module()
class Qwen3_5Model(Qwen3VLModel):
"""Qwen3_5 model."""

_arch = ['Qwen3_5ForConditionalGeneration', 'Qwen3_5MoeForConditionalGeneration']

def build_preprocessor(self):
check_transformers()
self.processor = AutoProcessor.from_pretrained(self.model_path)
tokenizer = self.processor.tokenizer
self.image_token = self.processor.image_token
self.image_token_id = tokenizer.encode(self.image_token)[-1]
self.mm_processor_kwargs = None
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Qwen3.5 VL preprocessor/model registration is added, but there are existing processor tests for Qwen3-VL (tests/test_lmdeploy/test_vl/test_qwen3vl_processor.py) and none for Qwen3.5. Please add an analogous unit test to validate build_preprocessor() + preprocess() behavior (including mm_processor_kwargs min/max pixel handling) so regressions are caught early.

Copilot uses AI. Check for mistakes.
@grimoire grimoire changed the title [WIP] Qwen3.5 Qwen3.5 Feb 16, 2026
@grimoire grimoire marked this pull request as ready for review February 16, 2026 12:32
@lvhan028 lvhan028 added the enhancement New feature or request label Feb 24, 2026
@lvhan028 lvhan028 requested a review from CUHKSZzxy February 24, 2026 04:45

conv_state_shape = (num_delta_layers, conv_dim, conv_kernel_size)
recurrent_state_shape = (num_delta_layers, num_v_heads, head_k_dim, head_v_dim)
dtype = torch.bfloat16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use dtype from hf_config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

causal_conv1d and fla only support bfloat16

@windreamer windreamer linked an issue Feb 25, 2026 that may be closed by this pull request
@lvhan028 lvhan028 requested a review from RunningLeon February 25, 2026 07:11
@lvhan028
Copy link
Collaborator

Can we put flash-linear-attention in requirements? If so, the fla installation hint can be removed from qwen3_next.py

@lvhan028
Copy link
Collaborator

The causal-conv1d installation should be included in the Dockerfile, and the transformers constraints need to be removed. I'll handle this in a separate PR.

Comment on lines +139 to +141
# do sigmoid and float here to prevent contiguous kernel
b = b.sigmoid().flatten(-2, -1)
a = a.float().flatten(-2, -1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on the reasoning behind b.sigmoid and a.float here. Previously, these operations were performed in forward. Are you suggesting that the previous approach affected memory contiguity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a and b are chunks of in_proj_ba(hidden_states), which is not contiguous.
flatten a,b requires another transpose kernel. Elementwise op would automatically contiguous the output so we can skip the flatten kernel.

@lvhan028
Copy link
Collaborator

Can we put flash-linear-attention in requirements? If so, the fla installation hint can be removed from qwen3_next.py

Confirmed that pip install flash-linear-attention works. Please add it to requirements.txt and remove the installation hint from the code

@CUHKSZzxy
Copy link
Collaborator

Text message works well, message with images fails:

  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 408, in <lambda>
    loop).add_done_callback(lambda f: None if f.cancelled() else f.result())
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 401, in _infer
    await asyncio.gather(*tasks)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 377, in _sync_resp
    async for out in g:
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/core/async_engine.py", line 334, in generate
    prompt_input = await self.prompt_processor.get_prompt_input(prompt=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 265, in get_prompt_input
    return await self._get_multimodal_prompt_input(messages=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 424, in _get_multimodal_prompt_input
    results = await self.vl_encoder.wrap_for_pytorch(results,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/engine.py", line 103, in wrap_for_pytorch
    result = self.model.to_pytorch(messages,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 125, in to_pytorch
    prompt, IMAGE_TOKEN = self.proc_messages(messages, chat_template, sequence_start, chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 114, in proc_messages
    prompt = chat_template.messages2prompt(prompt_messages, sequence_start, **chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 186, in messages2prompt
    content = get_text(message['content'])
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 31, in get_text
    return content[0]['text']
KeyError: 'text'

Likely due to chat_template issues. The logic fails here, https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py#L704

jinja2.exceptions.TemplateError: No user query found in messages.
> /nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py(705)__init__()
-> _, _, self.sentinel_system_messages, self.sentinel_system_prompt = self._role_instruction('system')

Consequently, qwen3.5 falls back to use BastChatTemplate rather than HFChatTemplate.

@grimoire
Copy link
Collaborator Author

The causal-conv1d installation should be included in the Dockerfile, and the transformers constraints need to be removed. I'll handle this in a separate PR.

I am working on causal-conv1d tilelang kernel.

@lvhan028
Copy link
Collaborator

Text message works well, message with images fails:

  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 342, in _invoke_callbacks
    callback(self)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 408, in <lambda>
    loop).add_done_callback(lambda f: None if f.cancelled() else f.result())
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/nvme1/zhouxinyu/miniconda3/envs/dev/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 401, in _infer
    await asyncio.gather(*tasks)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/pipeline.py", line 377, in _sync_resp
    async for out in g:
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/core/async_engine.py", line 334, in generate
    prompt_input = await self.prompt_processor.get_prompt_input(prompt=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 265, in get_prompt_input
    return await self._get_multimodal_prompt_input(messages=prompt,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/serve/processors/multimodal.py", line 424, in _get_multimodal_prompt_input
    results = await self.vl_encoder.wrap_for_pytorch(results,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/engine.py", line 103, in wrap_for_pytorch
    result = self.model.to_pytorch(messages,
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 125, in to_pytorch
    prompt, IMAGE_TOKEN = self.proc_messages(messages, chat_template, sequence_start, chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/vl/model/qwen3.py", line 114, in proc_messages
    prompt = chat_template.messages2prompt(prompt_messages, sequence_start, **chat_template_kwargs)
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 186, in messages2prompt
    content = get_text(message['content'])
  File "/nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py", line 31, in get_text
    return content[0]['text']
KeyError: 'text'

Likely due to chat_template issues. The logic fails here, https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py#L704

jinja2.exceptions.TemplateError: No user query found in messages.
> /nvme1/zhouxinyu/lmdeploy_dev/lmdeploy/model.py(705)__init__()
-> _, _, self.sentinel_system_messages, self.sentinel_system_prompt = self._role_instruction('system')

Consequently, qwen3.5 falls back to use BastChatTemplate rather than HFChatTemplate.

@CUHKSZzxy I've fixed this issue. May review again.

@lru_cache
def has_tilelang():
try:
import tilelang # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include tilelang in requirements.txt?

def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
"""Load weights."""

def __skip_layers(name):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this, or only for debugging purposes?

<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
<li>Qwen3-VL (2B - 235B)</li>
<li>Qwen3.5</li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May add model size here Qwen3.5 (27B - 397B) and also in the other README files.

torch.testing.assert_close(conv_state_clone, conv_state, rtol=1e-3, atol=1e-3)


class TestCausalConv1dFn:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to skip test case if causal_conv1d not installed

add_generation_prompt=add_generation_prompt,
**kwargs)
# Remove the sentinel part.
prompt = prompt[len(self.sentinel_system_prompt):] if len(self.sentinel_system_prompt) > 0 else prompt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ut fails due to self.sentinel_system_prompt is None

@lvhan028
Copy link
Collaborator

Evaluation test failed on commit id 6124d75

| GPQA_diamond_repeat_4 | 772ea0 | accuracy (4 runs average) | gen | 52.78 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 请求支持qwen3.5系列模型

5 participants