Skip to content

Commit 7b210ae

Browse files
authored
test: add unit tests for Llama4 min_latency code (#4980)
Signed-off-by: Po-Han Huang <[email protected]>
1 parent 7ddc4d6 commit 7b210ae

File tree

2 files changed

+444
-0
lines changed

2 files changed

+444
-0
lines changed

tensorrt_llm/_torch/models/modeling_llama_min_latency.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -471,6 +471,7 @@ def __init__(
471471
if num_experts == 128 \
472472
and hidden_size == 5120 \
473473
and intermediate_size == 8192 \
474+
and model_config.quant_config is not None \
474475
and model_config.quant_config.quant_mode.has_fp8_qdq() \
475476
and model_config.mapping.moe_tp_size == 8 \
476477
and model_config.mapping.moe_ep_size == 1 \

0 commit comments

Comments
 (0)