register mixer2_gated_rms_norm ir kernel by wxsIcey · Pull Request #7 · wxsIcey/vllm

wxsIcey · 2026-03-10T08:47:53Z

Purpose

Register mixer2_rms_norm_gated as a vllm IR op and rewrite Mixer2RMSNormGated.forward_native to dispatch correctly across all tensor-parallel configurations.

The implementation handles four cases:

Case	Condition	Issue	Solution
1	`n_groups=1`, `tp_size>1`	Variance must be computed across all ranks (one global norm group, each rank holds only a slice)	AllReduce local sum-of-squares → compute global variance
2	`n_groups=1`, `tp_size=1`	No TP, local data is complete	Use IR op directly
3	`n_groups>1`, `n_groups % tp_size != 0`	Group boundaries straddle rank boundaries (a rank may hold half a group), local norm is incorrect	AllGather full tensor → normalize locally → slice back to local rank
4	`n_groups>1`, `n_groups % tp_size == 0`	Each rank holds an integer number of complete groups, variance can be computed independently	Use IR op directly

Cases 2 and 4 require no collective communication and are handled by the IR op. Cases 1 and 3 require cross-rank communication that cannot be fused into a single kernel, so they are handled with explicit AllReduce / AllGather before calling into local computation.

Because forward_native now covers all cases (including the optimized IR op paths for cases 2 and 4), forward_cuda is fully redundant and can be removed.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Icey <1790571317@qq.com>

wxsIcey added 3 commits March 10, 2026 08:45

register mixer2_gated_rms_norm ir op

f29ae45

Signed-off-by: Icey <1790571317@qq.com>

wrap trion kernel to custom op

4fd5442

Signed-off-by: Icey <1790571317@qq.com>

change forward_cuda

dfbb087

Signed-off-by: Icey <1790571317@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

register mixer2_gated_rms_norm ir kernel#7

register mixer2_gated_rms_norm ir kernel#7
wxsIcey wants to merge 3 commits intoluka/vllm-ir/rms-normfrom
wxs/vllm_ir/mixer2-gated-rms-norm

wxsIcey commented Mar 10, 2026 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wxsIcey commented Mar 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wxsIcey commented Mar 10, 2026 •

edited by github-actions bot

Loading