-
Couldn't load subscription status.
- Fork 1.8k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Description
🚀 The feature, motivation and pitch
AD needs to be able to configure the triton kernels so that optimal parameters are chosen for diff batch sizes.
Vllm:
Selects a config dynamically based on batch size.
https://sourcegraph.com/github.com/vllm-project/vllm/-/blob/vllm/model_executor/layers/fused_moe/fused_moe.py?L815
config:
https://sourcegraph.com/github.com/vllm-project/vllm/-/tree/vllm/model_executor/layers/fused_moe/configs
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backend
Type
Projects
Status
In review