Skip to content

register mixer2_gated_rms_norm ir kernel#7

Open
wxsIcey wants to merge 3 commits intoluka/vllm-ir/rms-normfrom
wxs/vllm_ir/mixer2-gated-rms-norm
Open

register mixer2_gated_rms_norm ir kernel#7
wxsIcey wants to merge 3 commits intoluka/vllm-ir/rms-normfrom
wxs/vllm_ir/mixer2-gated-rms-norm

Conversation

@wxsIcey
Copy link
Owner

@wxsIcey wxsIcey commented Mar 10, 2026

Purpose

Register mixer2_rms_norm_gated as a vllm IR op and rewrite Mixer2RMSNormGated.forward_native to dispatch correctly across all tensor-parallel configurations.

The implementation handles four cases:

Case Condition Issue Solution
1 n_groups=1, tp_size>1 Variance must be computed across all ranks (one global norm group, each rank holds only a slice) AllReduce local sum-of-squares → compute global variance
2 n_groups=1, tp_size=1 No TP, local data is complete Use IR op directly
3 n_groups>1, n_groups % tp_size != 0 Group boundaries straddle rank boundaries (a rank may hold half a group), local norm is incorrect AllGather full tensor → normalize locally → slice back to local rank
4 n_groups>1, n_groups % tp_size == 0 Each rank holds an integer number of complete groups, variance can be computed independently Use IR op directly

Cases 2 and 4 require no collective communication and are handled by the IR op. Cases 1 and 3 require cross-rank communication that cannot be fused into a single kernel, so they are handled with explicit AllReduce / AllGather before calling into local computation.

Because forward_native now covers all cases (including the optimized IR op paths for cases 2 and 4), forward_cuda is fully redundant and can be removed.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

wxsIcey added 3 commits March 10, 2026 08:45
Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
Signed-off-by: Icey <1790571317@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant