[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

wenscarl · 2025-10-18T04:37:34Z

Purpose

Dispatch with nvfp4 DeepEP low latency mode. The dispatch is a fusion of quantization and dispatch.
deps on deepseek-ai/DeepEP#341 and #25990. Should rebase after 25990 is merged.

Test Plan

NVFP4 dispatch:
VLLM_DEEPEPLL_NVFP4_DISPATCH=1
VLLM_USE_FLASHINFER_MOE_FP4=1
VLLM_USE_STANDALONE_COMPILE=0
VLLM_FLASHINFER_MOE_BACKEND="cutedsl"
VLLM_WORKER_MULTIPROC_METHOD=spawn
VLLM_ALL2ALL_BACKEND="deepep_low_latency"
lm_eval --model vllm --model_args pretrained=nvidia/DeepSeek-R1-0528-FP4,data_parallel_size=4,enable_expert_parallel=True,tensor_parallel_size=1,enforce_eager=True,max_model_len=2048 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9484	±	0.0061
		strict-match	5	exact_match	↑	0.9462	±	0.0062

BF16 dispatch:
with VLLM_DEEPEPLL_NVFP4_DISPATCH=0:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9492	±	0.0060
		strict-match	5	exact_match	↑	0.9439	±	0.0063

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Shu Wang <[email protected]>

mergify · 2025-10-20T04:36:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wenscarl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Shu Wang. <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-10-20T04:53:18Z

vllm/utils/flashinfer.py

+def has_flashinfer_cutedsl_grouped_gemm_nt_masked() -> bool:
+    """Return ``True`` if FlashInfer CUTLASS fused MoE is available."""
+    if not has_flashinfer_cutedsl():
+        return False
+
+    # Check if all required functions are available
+    required_functions = [
+        ("flashinfer.cute_dsl.blockscaled_gemm", "grouped_gemm_nt_masked"),
+        ("flashinfer", "scaled_fp4_grouped_quantize"),
+        ("flashinfer", "silu_and_scaled_nvfp4_experts_quantize"),
+    ]
+
+    for module_name, attr_name in required_functions:
+        mod = _get_submodule(module_name)
+        if not mod or not hasattr(mod, attr_name):
+            return False


Fix typo in CuteDSL availability check

The new has_flashinfer_cutedsl_grouped_gemm_nt_masked guard always returns False because the third required symbol is spelled "silu_and_scaled_nvfp4_experts_quantize", but every other place in this commit (and in the FlashInfer API) refers to silu_and_mul_scaled_nvfp4_experts_quantize. As written the attribute lookup will fail even when the kernel is correctly installed, so the capability probe disables the entire CuteDSL path and the nvfp4 DeepEP dispatch can never be selected. Please rename the checked attribute to match the actual import.

Useful? React with 👍 / 👎.

wenscarl added 3 commits October 14, 2025 03:28

Add flashinfer_cutedsl grouped gemm

c063911

Signed-off-by: Shu Wang <[email protected]>

Make fused version work with cuda graph

8a224da

Signed-off-by: Shu Wang <[email protected]>

fix pre-commit

ec6acfd

Signed-off-by: Shu Wang <[email protected]>

mergify bot added ci/build v1 labels Oct 18, 2025

wenscarl force-pushed the fp4dispatch branch from 2bd88bf to 309e0b8 Compare October 20, 2025 03:30

wenscarl changed the title ~~[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch~~ [MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch Oct 20, 2025

mergify bot added the needs-rebase label Oct 20, 2025

Add DeepEP LL nvfp4 dispatch.

65548dd

Signed-off-by: Shu Wang. <[email protected]>

wenscarl force-pushed the fp4dispatch branch from 925c15a to 65548dd Compare October 20, 2025 04:48

wenscarl marked this pull request as ready for review October 20, 2025 04:49

wenscarl requested review from WoosukKwon, mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 20, 2025 04:49

Merge remote-tracking branch 'origin/main' into fp4dispatch

eff9ea0

Signed-off-by: Shu Wang. <[email protected]>

mergify bot removed the needs-rebase label Oct 20, 2025

chatgpt-codex-connector bot reviewed Oct 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

wenscarl commented Oct 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Oct 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

Are you sure you want to change the base?

[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

Conversation

wenscarl commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 20, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wenscarl commented Oct 18, 2025 •

edited by github-actions bot

Loading