Skip to content

Conversation

@amirkl94
Copy link
Contributor

@amirkl94 amirkl94 commented Oct 21, 2025

📌 Description

Verify quant scales for fp8 are non null in cutlass FusedMoE path. Currently, if these tensors are passed as None from python it will result in segmentation fault.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced validation for FP8 quantization parameters to improve system robustness and prevent potential null reference issues during quantization operations, reducing the risk of runtime errors when processing quantized model data.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @amirkl94, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the Cutlass FP8 FusedMoE implementation by adding robust validation for quantization scale tensors. By ensuring these tensors are not null, the change effectively prevents segmentation faults that could arise from None values being passed from Python, thereby improving the overall reliability and robustness of the FusedMoE computation.

Highlights

  • Null Pointer Safeguard: Introduced explicit null checks for critical quantization scale tensors (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) within the Cutlass FP8 FusedMoE path.
  • Segmentation Fault Prevention: The added checks prevent segmentation faults that previously occurred when these quantization scale tensors were passed as None from Python, enhancing the stability of the FusedMoE operation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Added runtime null checks for FP8 quantization parameters (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) in getQuantParams before type validation to tighten input validation for FP8 quant scales.

Changes

Cohort / File(s) Change Summary
FP8 Quantization Null Checks
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu
Inserted runtime null validation for FP8 dequantization and quantization parameter pointers (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) prior to existing type checks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Points to review:

  • Confirm null-check placement does not change error reporting semantics.
  • Verify no assumptions elsewhere rely on those pointers being non-null before this function.

Poem

🐰 I hopped through code with careful pace,

Checked each pointer in its place,
FP8 scales now won't surprise,
No nulls sneak in—bright, wise eyes,
A little hop, a safer race.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description provides only the Description section from the required template, which is well-written and clearly explains what the PR does and why it is needed. However, it is significantly incomplete compared to the template structure. The description is missing the "Related Issues" section, the "Pre-commit Checks" checklist, the "Tests" checklist, and the "Reviewer Notes" section. Of the three to four main required sections, only the Description section is filled out, which represents a substantial deviation from the expected template structure. The author should complete the pull request description by adding the missing sections from the template: link any related issues in the "Related Issues" section, check off the relevant pre-commit and tests items in the "Pull Request Checklist", and optionally add any reviewer notes. Following the template structure ensures consistency and helps reviewers understand the testing and quality assurance that has been performed.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "Fix: Verify scales are not None for Cutlass FP8 FusedMoE" directly and accurately describes the primary change in the changeset. According to the raw summary, the PR adds runtime null checks for FP8 quantization parameters to prevent segmentation faults when these tensors are passed as None from Python. The title is concise, specific, and clearly communicates the main change without unnecessary noise or vague terminology.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4bce8ff and dd1864b.

📒 Files selected for processing (1)
  • csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds important null pointer checks for quantization scales in the FP8 FusedMoE path, which is a good defensive measure to prevent potential segmentation faults. The change is correct and addresses the issue described. I've found a minor typo in one of the new error messages and have provided a suggestion to fix it.

Comment on lines 805 to 806
TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
<< "Expecting fc1fc2_dequant_dequant to be non null";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a copy-paste error in this error message. It should refer to fc2_dequant to match the variable being checked.

      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
          << "Expecting fc2_dequant to be non null";

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls take a look at this comment from gemini

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1)

839-843: Add individual null checks to other quantization paths like the FP8 path.

The FP8 path (lines 803-808) checks each extracted tensor element for null (fc1_dequant.get() != nullptr), but other quantization modes only verify quant_scales.value().size() before directly accessing array elements. Since Python tests pass quant_scales=None to these other paths (NVFP4, W4A8_MXFP4_FP8, W4A8_MXFP4_MXFP8, BlockScaling, W4A16, INT4), they share the same segfault vulnerability.

Add per-item null checks after extracting each tensor from quant_scales.value() in:

  • W4A8_MXFP4_FP8 (lines 839-843): after extracting fc1_weight_block, fc1_global, fc2_act_global, fc2_weight_block, fc2_global
  • W4A8_MXFP4_MXFP8 (lines 904-907): after extracting fc1_weight_block, fc1_global, fc2_weight_block, fc2_global
  • NVFP4 (lines 963-968): after extracting all 6 scale tensors
  • BlockScaling (lines 1028-1029): after extracting fc1_scales, fc2_scales
  • W4A16 (lines 1037-1038): after extracting fc1_weight_scales, fc2_weight_scales
  • INT4 (lines 1048-1055): after extracting all 8 scale tensors
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f2596 and 4bce8ff.

📒 Files selected for processing (1)
  • csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs

@yongwww
Copy link
Collaborator

yongwww commented Oct 21, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !88 has been created, and the CI pipeline #36988944 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #36988944: 1/17 passed

Signed-off-by: Amir Klein <[email protected]>
@amirkl94 amirkl94 requested a review from djmmoss as a code owner October 26, 2025 08:04
@yzh119
Copy link
Collaborator

yzh119 commented Oct 27, 2025

Failed UTs are not relavant, let's merge this first.

@yzh119 yzh119 merged commit 99657ed into flashinfer-ai:main Oct 27, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants