Skip to content

[AutoDeploy]: Dynamically select a tile size for fused_mlp_moe_kernel #8511

@nzmora-nvidia

Description

@nzmora-nvidia

🚀 The feature, motivation and pitch

AD needs to be able to configure the triton kernels so that optimal parameters are chosen for diff batch sizes.
Vllm:
Selects a config dynamically based on batch size.
https://sourcegraph.com/github.com/vllm-project/vllm/-/blob/vllm/model_executor/layers/fused_moe/fused_moe.py?L815
config:
https://sourcegraph.com/github.com/vllm-project/vllm/-/tree/vllm/model_executor/layers/fused_moe/configs

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

AutoDeploy<NV> AutoDeploy Backend

Type

No type

Projects

Status

In review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions