[None] [test] Add MNNVL AlltoAll tests to pre-merge #8601

kaiyux · 2025-10-23T02:50:53Z

Reverts #8106

Summary by CodeRabbit

New Features
- Added support for configurable summation behavior in expert model routing to optimize inference performance.
Tests
- Improved test coverage for expert model configurations on multi-GPU setups.
- Enhanced device placement optimization for model weights during testing.

)" This reverts commit 84aa3c9.

kaiyux · 2025-10-23T02:57:31Z

/bot run --only-multi-gpu-test

coderabbitai · 2025-10-23T02:58:12Z

📝 Walkthrough

Walkthrough

The changes introduce a new alltoall_result_do_sum flag to control summation reduction behavior during alltoall result combination in the fused MoE wide EP implementation. The flag is threaded through method signatures, exposed via **kwargs in the MoE interface, conditionally passed from the DeepSeekV3 model, and test infrastructure is updated to enable MNNVL variant testing with GPU-resident weights.

Changes

Cohort / File(s)	Summary
MoE Wide EP Implementation Enhancement `tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py`	Added `alltoall_result_do_sum: bool = True` parameter to `forward_chunk`, `forward_impl`, and `alltoall_combine` methods. The flag controls whether summation reduction is applied in alltoall result combination via the `do_reduce` parameter in the MNNVL path. Parameter is threaded through internal call chains and chunked processing paths.
MoE Interface Generalization `tensorrt_llm/_torch/modules/fused_moe/interface.py`	Added `kwargs` to public `MoE.forward` signature to accept arbitrary keyword arguments. Propagated `kwargs` through calls to `moe_custom_op` (accelerated path) and `forward_impl`, enabling extra parameters to reach underlying implementations.
Model Integration `tensorrt_llm/_torch/models/modeling_deepseekv3.py`	Imported `WideEPMoE` class. Updated `compute_routed_output` to conditionally pass `alltoall_result_do_sum=False` keyword argument when routing through `WideEPMoE` expert instances.
Test Configuration `tests/integration/test_lists/test-db/l0_dgx_b200.yml`, `tests/integration/test_lists/test-db/l0_dgx_h100.yml`	Added `test_fused_moe_alltoall[MNNVL]` and `test_fused_moe_alltoall_fp4[MNNVL]` test cases to pre-merge MPI workflow blocks for multi-GPU configurations.
Test Code Updates `tests/unittest/_torch/modules/test_fused_moe.py`	Modified `per_rank_test_fused_moe_alltoall` to create weight tensors (`w1_weight`, `w2_weight`, `w3_weight`) on CUDA device. Removed unconditional skip decorator from `test_fused_moe_alltoall_fp4` and added conditional skip for `DeepEPLowLatency` alltoall method type.

Sequence Diagram

sequenceDiagram
    participant DeepSeekV3
    participant MoEInterface
    participant WideEPMoE
    participant FusedMoEWideEP

    DeepSeekV3->>MoEInterface: forward(x, ..., alltoall_result_do_sum=False)
    MoEInterface->>WideEPMoE: forward_impl(x, ..., alltoall_result_do_sum=False)
    WideEPMoE->>FusedMoEWideEP: forward_chunk(x, ..., alltoall_result_do_sum=False)
    FusedMoEWideEP->>FusedMoEWideEP: alltoall_combine(result, ..., do_reduce=False)
    FusedMoEWideEP-->>FusedMoEWideEP: Skip summation reduction based on flag
    FusedMoEWideEP-->>WideEPMoE: Return combined output
    WideEPMoE-->>MoEInterface: Return result
    MoEInterface-->>DeepSeekV3: Return final output

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes introduce a new parameter that is consistently threaded through multiple method signatures in a clear pattern. However, the diff spans five distinct files with varying change types—from signature updates to test infrastructure modifications—requiring separate reasoning for each cohort. The logic itself is straightforward (parameter passing and conditional reduction control), but the breadth of affected components and the need to verify parameter propagation across layers increases the review effort.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description consists of a single sentence stating "Reverts #8106" and does not follow the repository's PR template structure at all. The template requires sections for Description (explaining issue and solution), Test Coverage (listing relevant tests), and a PR Checklist. The current description is missing all substantive content expected by the template, including explanations of what the revert addresses, why it was necessary, which tests safeguard the changes, and confirmation of coding guidelines and dependency checks.	Expand the PR description to follow the template by adding: (1) a description section explaining what PR 8106 changed and why reverting it is necessary, (2) a test coverage section listing the MNNVL AlltoAll tests being re-enabled and any other relevant tests, and (3) completion of the PR checklist to confirm adherence to coding guidelines, test coverage, and other pre-submission requirements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "[None] [test] Add MNNVL AlltoAll tests to pre-merge" accurately describes the test additions included in this changeset, following the required format with no ticket identifier and a [test] type designation. However, the title focuses primarily on the test aspect while omitting significant code modifications to the fused MoE modules and modeling files that support this functionality. Since the title clearly refers to a real and meaningful part of the change (the MNNVL AlltoAll test additions), it meets the criteria of being meaningfully related to the changeset, allowing a teammate to understand a primary aspect of what's being changed.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch revert-8106-user/brb/waive-5553013

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7c2c8c and 35174b3.

📒 Files selected for processing (6)

tensorrt_llm/_torch/models/modeling_deepseekv3.py (2 hunks)
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (10 hunks)
tensorrt_llm/_torch/modules/fused_moe/interface.py (2 hunks)
tests/integration/test_lists/test-db/l0_dgx_b200.yml (1 hunks)
tests/integration/test_lists/test-db/l0_dgx_h100.yml (1 hunks)
tests/unittest/_torch/modules/test_fused_moe.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/modules/fused_moe/interface.py
tensorrt_llm/_torch/models/modeling_deepseekv3.py
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
tests/unittest/_torch/modules/test_fused_moe.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/modules/fused_moe/interface.py
tensorrt_llm/_torch/models/modeling_deepseekv3.py
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
tests/unittest/_torch/modules/test_fused_moe.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/modules/fused_moe/interface.py
tensorrt_llm/_torch/models/modeling_deepseekv3.py
tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py
tests/unittest/_torch/modules/test_fused_moe.py

🧬 Code graph analysis (3)

tensorrt_llm/_torch/models/modeling_deepseekv3.py (1)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

WideEPMoE (41-1086)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (2)

tensorrt_llm/_torch/utils.py (1)

Fp4QuantizedTensor (99-106)

tensorrt_llm/_mnnvl_utils.py (1)

MoEAlltoallInfo (342-349)

tests/unittest/_torch/modules/test_fused_moe.py (1)

tensorrt_llm/_torch/modules/fused_moe/fused_moe_wide_ep.py (1)

AlltoallMethodType (30-38)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Pre-commit Check
GitHub Check: Analyze (python)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-10-23T03:04:47Z

PR_Github #22234 [ run ] triggered by Bot. Commit: 35174b3

tensorrt-cicd · 2025-10-23T08:22:24Z

PR_Github #22234 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16764 (Partly Tested) completed with status: 'FAILURE'

kaiyux · 2025-10-24T02:31:36Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-10-24T02:37:44Z

PR_Github #22353 [ run ] triggered by Bot. Commit: 35174b3

tensorrt-cicd · 2025-10-24T05:16:19Z

PR_Github #22353 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16852 (Partly Tested) completed with status: 'SUCCESS'

kaiyux · 2025-10-24T06:48:08Z

/bot run --add-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-10-24T06:54:08Z

PR_Github #22397 [ run ] triggered by Bot. Commit: 35174b3

tensorrt-cicd · 2025-10-24T09:51:34Z

PR_Github #22397 [ run ] completed with state SUCCESS. Commit: 35174b3
/LLM/main/L0_MergeRequest_PR pipeline #16880 completed with status: 'FAILURE'

Revert "[None][chore] Waive failing MNNVL alltoall multi-gpu test (#8106

35174b3

)" This reverts commit 84aa3c9.

kaiyux requested review from a team as code owners October 23, 2025 02:50

kaiyux requested review from dongxuy04, mikeiovine and yechank-nvidia October 23, 2025 02:50

dongxuy04 approved these changes Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None] [test] Add MNNVL AlltoAll tests to pre-merge #8601

[None] [test] Add MNNVL AlltoAll tests to pre-merge #8601

kaiyux commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

kaiyux commented Oct 23, 2025

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

kaiyux commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

kaiyux commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[None] [test] Add MNNVL AlltoAll tests to pre-merge #8601

Are you sure you want to change the base?

[None] [test] Add MNNVL AlltoAll tests to pre-merge #8601

Conversation

kaiyux commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

kaiyux commented Oct 23, 2025

Uh oh!

coderabbitai bot commented Oct 23, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

tensorrt-cicd commented Oct 23, 2025

Uh oh!

kaiyux commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

kaiyux commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

tensorrt-cicd commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kaiyux commented Oct 23, 2025 •

edited by coderabbitai bot

Loading