[AutoDeploy]: Dynamically select a tile size for `fused_mlp_moe_kernel`

### 🚀 The feature, motivation and pitch

AD needs to be able to configure the triton kernels so that optimal parameters are chosen for diff batch sizes.
Vllm: 
Selects a config dynamically based on batch size.
https://sourcegraph.com/github.com/vllm-project/vllm/-/blob/vllm/model_executor/layers/fused_moe/fused_moe.py?L815
config:
https://sourcegraph.com/github.com/vllm-project/vllm/-/tree/vllm/model_executor/layers/fused_moe/configs


### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[AutoDeploy]: Dynamically select a tile size for `fused_mlp_moe_kernel` #8511

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[AutoDeploy]: Dynamically select a tile size for fused_mlp_moe_kernel #8511

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[AutoDeploy]: Dynamically select a tile size for `fused_mlp_moe_kernel` #8511