Skip to content

Commit d7cf3d9

Browse files
committed
Add max_draft_tokens more tokens to kv cache seq_len for memory
estimation Signed-off-by: Hui Gao <[email protected]>
1 parent bb528cf commit d7cf3d9

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

tensorrt_llm/_torch/pyexecutor/_util.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,10 +153,12 @@ def _get_token_num_for_estimation(self) -> int:
153153
num_cache_blocks = 0
154154
num_extra_tokens_per_seq = 1 # account for generated tokens
155155
pytorch_backend_config = executor_config.pytorch_backend_config
156+
spec_cfg = executor_config.speculative_config
156157
if not pytorch_backend_config.disable_overlap_scheduler:
157158
num_extra_tokens_per_seq = num_extra_tokens_per_seq + 1
159+
if spec_cfg is not None:
160+
num_extra_tokens_per_seq += spec_cfg.max_draft_tokens
158161

159-
spec_cfg = executor_config.speculative_config
160162
if spec_cfg is not None:
161163
num_extra_tokens_per_seq += spec_cfg.max_draft_tokens
162164
num_extra_tokens_per_seq += spec_cfg.num_extra_kv_tokens

0 commit comments

Comments
 (0)