Skip to content

Conversation

@kaiyux
Copy link
Member

@kaiyux kaiyux commented Oct 23, 2025

Reverts #8106

Summary by CodeRabbit

  • New Features

    • Added support for configurable summation behavior in expert model routing to optimize inference performance.
  • Tests

    • Improved test coverage for expert model configurations on multi-GPU setups.
    • Enhanced device placement optimization for model weights during testing.

@kaiyux kaiyux requested review from a team as code owners October 23, 2025 02:50
@kaiyux
Copy link
Member Author

kaiyux commented Oct 23, 2025

/bot run --only-multi-gpu-test

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

📝 Walkthrough

Walkthrough

The changes introduce a new alltoall_result_do_sum flag to control summation reduction behavior during alltoall result combination in the fused MoE wide EP implementation. The flag is threaded through method signatures, exposed via **kwargs in the MoE interface, conditionally passed from the DeepSeekV3 model, and test infrastructure is updated to enable MNNVL variant testing with GPU-resident weights.

Changes

Cohort / File(s) Summary
MoE Wide EP Implementation Enhancement
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
Added alltoall_result_do_sum: bool = True parameter to forward_chunk, forward_impl, and alltoall_combine methods. The flag controls whether summation reduction is applied in alltoall result combination via the do_reduce parameter in the MNNVL path. Parameter is threaded through internal call chains and chunked processing paths.
MoE Interface Generalization
tensorrt_llm/_torch/modules/fused_moe/interface.py
Added **kwargs to public MoE.forward signature to accept arbitrary keyword arguments. Propagated **kwargs through calls to moe_custom_op (accelerated path) and forward_impl, enabling extra parameters to reach underlying implementations.
Model Integration
tensorrt_llm/_torch/models/modeling_deepseekv3.py
Imported WideEPMoE class. Updated compute_routed_output to conditionally pass alltoall_result_do_sum=False keyword argument when routing through WideEPMoE expert instances.
Test Configuration
tests/integration/test_lists/test-db/l0_dgx_b200.yml, tests/integration/test_lists/test-db/l0_dgx_h100.yml
Added test_fused_moe_alltoall[MNNVL] and test_fused_moe_alltoall_fp4[MNNVL] test cases to pre-merge MPI workflow blocks for multi-GPU configurations.
Test Code Updates
tests/unittest/_torch/modules/test_fused_moe.py
Modified per_rank_test_fused_moe_alltoall to create weight tensors (w1_weight, w2_weight, w3_weight) on CUDA device. Removed unconditional skip decorator from test_fused_moe_alltoall_fp4 and added conditional skip for DeepEPLowLatency alltoall method type.

Sequence Diagram

sequenceDiagram
    participant DeepSeekV3
    participant MoEInterface
    participant WideEPMoE
    participant FusedMoEWideEP

    DeepSeekV3->>MoEInterface: forward(x, ..., alltoall_result_do_sum=False)
    MoEInterface->>WideEPMoE: forward_impl(x, ..., alltoall_result_do_sum=False)
    WideEPMoE->>FusedMoEWideEP: forward_chunk(x, ..., alltoall_result_do_sum=False)
    FusedMoEWideEP->>FusedMoEWideEP: alltoall_combine(result, ..., do_reduce=False)
    FusedMoEWideEP-->>FusedMoEWideEP: Skip summation reduction based on flag
    FusedMoEWideEP-->>WideEPMoE: Return combined output
    WideEPMoE-->>MoEInterface: Return result
    MoEInterface-->>DeepSeekV3: Return final output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes introduce a new parameter that is consistently threaded through multiple method signatures in a clear pattern. However, the diff spans five distinct files with varying change types—from signature updates to test infrastructure modifications—requiring separate reasoning for each cohort. The logic itself is straightforward (parameter passing and conditional reduction control), but the breadth of affected components and the need to verify parameter propagation across layers increases the review effort.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description consists of a single sentence stating "Reverts #8106" and does not follow the repository's PR template structure at all. The template requires sections for Description (explaining issue and solution), Test Coverage (listing relevant tests), and a PR Checklist. The current description is missing all substantive content expected by the template, including explanations of what the revert addresses, why it was necessary, which tests safeguard the changes, and confirmation of coding guidelines and dependency checks. Expand the PR description to follow the template by adding: (1) a description section explaining what PR 8106 changed and why reverting it is necessary, (2) a test coverage section listing the MNNVL AlltoAll tests being re-enabled and any other relevant tests, and (3) completion of the PR checklist to confirm adherence to coding guidelines, test coverage, and other pre-submission requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "[None] [test] Add MNNVL AlltoAll tests to pre-merge" accurately describes the test additions included in this changeset, following the required format with no ticket identifier and a [test] type designation. However, the title focuses primarily on the test aspect while omitting significant code modifications to the fused MoE modules and modeling files that support this functionality. Since the title clearly refers to a real and meaningful part of the change (the MNNVL AlltoAll test additions), it meets the criteria of being meaningfully related to the changeset, allowing a teammate to understand a primary aspect of what's being changed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch revert-8106-user/brb/waive-5553013

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2c8c and 35174b3.

📒 Files selected for processing (6)
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py (2 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (10 hunks)
  • tensorrt_llm/_torch/modules/fused_moe/interface.py (2 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
  • tests/integration/test_lists/test-db/l0_dgx_h100.yml (1 hunks)
  • tests/unittest/_torch/modules/test_fused_moe.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/_torch/modules/fused_moe/interface.py
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
  • tests/unittest/_torch/modules/test_fused_moe.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/_torch/modules/fused_moe/interface.py
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
  • tests/unittest/_torch/modules/test_fused_moe.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/_torch/modules/fused_moe/interface.py
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
  • tests/unittest/_torch/modules/test_fused_moe.py
🧬 Code graph analysis (3)
tensorrt_llm/_torch/models/modeling_deepseekv3.py (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)
  • WideEPMoE (41-1086)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2)
tensorrt_llm/_torch/utils.py (1)
  • Fp4QuantizedTensor (99-106)
tensorrt_llm/_mnnvl_utils.py (1)
  • MoEAlltoallInfo (342-349)
tests/unittest/_torch/modules/test_fused_moe.py (1)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)
  • AlltoallMethodType (30-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Pre-commit Check
  • GitHub Check: Analyze (python)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22234 [ run ] triggered by Bot. Commit: 35174b3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22234 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16764 (Partly Tested) completed with status: 'FAILURE'

@kaiyux
Copy link
Member Author

kaiyux commented Oct 24, 2025

/bot run --only-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22353 [ run ] triggered by Bot. Commit: 35174b3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22353 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16852 (Partly Tested) completed with status: 'SUCCESS'

@kaiyux
Copy link
Member Author

kaiyux commented Oct 24, 2025

/bot run --add-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22397 [ run ] triggered by Bot. Commit: 35174b3

@tensorrt-cicd
Copy link
Collaborator

PR_Github #22397 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16880 completed with status: 'FAILURE'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants