Skip to content

Conversation

@wenscarl
Copy link
Contributor

@wenscarl wenscarl commented Oct 18, 2025

Purpose

Dispatch with nvfp4 DeepEP low latency mode. The dispatch is a fusion of quantization and dispatch.
deps on deepseek-ai/DeepEP#341 and #25990. Should rebase after 25990 is merged.

Test Plan

NVFP4 dispatch:
VLLM_DEEPEPLL_NVFP4_DISPATCH=1
VLLM_USE_FLASHINFER_MOE_FP4=1
VLLM_USE_STANDALONE_COMPILE=0
VLLM_FLASHINFER_MOE_BACKEND="cutedsl"
VLLM_WORKER_MULTIPROC_METHOD=spawn
VLLM_ALL2ALL_BACKEND="deepep_low_latency"
lm_eval --model vllm --model_args pretrained=nvidia/DeepSeek-R1-0528-FP4,data_parallel_size=4,enable_expert_parallel=True,tensor_parallel_size=1,enforce_eager=True,max_model_len=2048 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto

Test Result

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9484 ± 0.0061
strict-match 5 exact_match 0.9462 ± 0.0062

BF16 dispatch:
with VLLM_DEEPEPLL_NVFP4_DISPATCH=0:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9492 ± 0.0060
strict-match 5 exact_match 0.9439 ± 0.0063

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@wenscarl wenscarl changed the title [Core] CuteDSL MoE with Nvfp4 DeepEP dispatch [MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch Oct 20, 2025
@mergify
Copy link

mergify bot commented Oct 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wenscarl.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 20, 2025
@mergify mergify bot removed the needs-rebase label Oct 20, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +191 to +206
def has_flashinfer_cutedsl_grouped_gemm_nt_masked() -> bool:
"""Return ``True`` if FlashInfer CUTLASS fused MoE is available."""
if not has_flashinfer_cutedsl():
return False

# Check if all required functions are available
required_functions = [
("flashinfer.cute_dsl.blockscaled_gemm", "grouped_gemm_nt_masked"),
("flashinfer", "scaled_fp4_grouped_quantize"),
("flashinfer", "silu_and_scaled_nvfp4_experts_quantize"),
]

for module_name, attr_name in required_functions:
mod = _get_submodule(module_name)
if not mod or not hasattr(mod, attr_name):
return False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fix typo in CuteDSL availability check

The new has_flashinfer_cutedsl_grouped_gemm_nt_masked guard always returns False because the third required symbol is spelled "silu_and_scaled_nvfp4_experts_quantize", but every other place in this commit (and in the FlashInfer API) refers to silu_and_mul_scaled_nvfp4_experts_quantize. As written the attribute lookup will fail even when the kernel is correctly installed, so the capability probe disables the entire CuteDSL path and the nvfp4 DeepEP dispatch can never be selected. Please rename the checked attribute to match the actual import.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant