Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

amirkl94 · 2025-10-21T08:53:44Z

📌 Description

Verify quant scales for fp8 are non null in cutlass FusedMoE path. Currently, if these tensors are passed as None from python it will result in segmentation fault.

Summary by CodeRabbit

Bug Fixes
- Enhanced validation for FP8 quantization parameters to improve system robustness and prevent potential null reference issues during quantization operations, reducing the risk of runtime errors when processing quantized model data.

Signed-off-by: Amir Klein <[email protected]>

gemini-code-assist · 2025-10-21T08:53:57Z

Summary of Changes

Hello @amirkl94, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the Cutlass FP8 FusedMoE implementation by adding robust validation for quantization scale tensors. By ensuring these tensors are not null, the change effectively prevents segmentation faults that could arise from None values being passed from Python, thereby improving the overall reliability and robustness of the FusedMoE computation.

Highlights

Null Pointer Safeguard: Introduced explicit null checks for critical quantization scale tensors (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) within the Cutlass FP8 FusedMoE path.
Segmentation Fault Prevention: The added checks prevent segmentation faults that previously occurred when these quantization scale tensors were passed as None from Python, enhancing the stability of the FusedMoE operation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-21T08:54:12Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added runtime null checks for FP8 quantization parameters (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) in getQuantParams before type validation to tighten input validation for FP8 quant scales.

Changes

Cohort / File(s)	Change Summary
FP8 Quantization Null Checks `csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu`	Inserted runtime null validation for FP8 dequantization and quantization parameter pointers (`fc1_dequant`, `fc2_quant`, `fc2_dequant`, `fc1_input_dequant`) prior to existing type checks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Points to review:

Confirm null-check placement does not change error reporting semantics.
Verify no assumptions elsewhere rely on those pointers being non-null before this function.

Poem

🐰 I hopped through code with careful pace,

Checked each pointer in its place,
FP8 scales now won't surprise,
No nulls sneak in—bright, wise eyes,
A little hop, a safer race.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description provides only the Description section from the required template, which is well-written and clearly explains what the PR does and why it is needed. However, it is significantly incomplete compared to the template structure. The description is missing the "Related Issues" section, the "Pre-commit Checks" checklist, the "Tests" checklist, and the "Reviewer Notes" section. Of the three to four main required sections, only the Description section is filled out, which represents a substantial deviation from the expected template structure.	The author should complete the pull request description by adding the missing sections from the template: link any related issues in the "Related Issues" section, check off the relevant pre-commit and tests items in the "Pull Request Checklist", and optionally add any reviewer notes. Following the template structure ensures consistency and helps reviewers understand the testing and quality assurance that has been performed.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "Fix: Verify scales are not None for Cutlass FP8 FusedMoE" directly and accurately describes the primary change in the changeset. According to the raw summary, the PR adds runtime null checks for FP8 quantization parameters to prevent segmentation faults when these tensors are passed as None from Python. The title is concise, specific, and clearly communicates the main change without unnecessary noise or vague terminology.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bce8ff and dd1864b.

📒 Files selected for processing (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds important null pointer checks for quantization scales in the FP8 FusedMoE path, which is a good defensive measure to prevent potential segmentation faults. The change is correct and addresses the issue described. I've found a minor typo in one of the new error messages and have provided a suggestion to fix it.

gemini-code-assist · 2025-10-21T08:54:50Z

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu

+      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
+          << "Expecting fc1fc2_dequant_dequant to be non null";


There appears to be a copy-paste error in this error message. It should refer to fc2_dequant to match the variable being checked.

TVM_FFI_ICHECK(fc2_dequant.get() != nullptr) << "Expecting fc2_dequant to be non null";

pls take a look at this comment from gemini

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1)

839-843: Add individual null checks to other quantization paths like the FP8 path.

The FP8 path (lines 803-808) checks each extracted tensor element for null (fc1_dequant.get() != nullptr), but other quantization modes only verify quant_scales.value().size() before directly accessing array elements. Since Python tests pass quant_scales=None to these other paths (NVFP4, W4A8_MXFP4_FP8, W4A8_MXFP4_MXFP8, BlockScaling, W4A16, INT4), they share the same segfault vulnerability.

Add per-item null checks after extracting each tensor from quant_scales.value() in:

W4A8_MXFP4_FP8 (lines 839-843): after extracting fc1_weight_block, fc1_global, fc2_act_global, fc2_weight_block, fc2_global

W4A8_MXFP4_MXFP8 (lines 904-907): after extracting fc1_weight_block, fc1_global, fc2_weight_block, fc2_global

NVFP4 (lines 963-968): after extracting all 6 scale tensors

BlockScaling (lines 1028-1029): after extracting fc1_scales, fc2_scales

W4A16 (lines 1037-1038): after extracting fc1_weight_scales, fc2_weight_scales

INT4 (lines 1048-1055): after extracting all 8 scale tensors

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f2596 and 4bce8ff.

📒 Files selected for processing (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu

yongwww · 2025-10-21T16:07:02Z

/bot run

flashinfer-bot · 2025-10-21T16:07:38Z

GitLab MR !88 has been created, and the CI pipeline #36988944 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-10-22T00:07:17Z

[FAILED] Pipeline #36988944: 1/17 passed

Signed-off-by: Amir Klein <[email protected]>

yzh119 · 2025-10-27T21:02:47Z

Failed UTs are not relavant, let's merge this first.

Verify scales are not None for Cutlass FP8 FusedMoE

4bce8ff

Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested review from cyx-6, wenscarl, yongwww and yzh119 as code owners October 21, 2025 08:53

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu Show resolved Hide resolved

Fix error message

dd1864b

Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested a review from djmmoss as a code owner October 26, 2025 08:04

yzh119 approved these changes Oct 27, 2025

View reviewed changes

yzh119 merged commit 99657ed into flashinfer-ai:main Oct 27, 2025
4 checks passed

coderabbitai bot mentioned this pull request Oct 28, 2025

Bugfix: Change get() -> GetDLTensorPtr() in cutlass FusedMoE validations #1995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Uh oh!

amirkl94 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Uh oh!

yongwww Oct 21, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

yongwww commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 22, 2025

Uh oh!

yzh119 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
		<< "Expecting fc1fc2_dequant_dequant to be non null";

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Uh oh!

Conversation

amirkl94 commented Oct 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

yongwww Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yongwww commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 22, 2025

Uh oh!

yzh119 commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amirkl94 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading