[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation #5343

HuiGao-NV · 2025-06-19T00:17:46Z

When estimate memory consumption, we need to a tmp kv cache. We need to leave one more token space to finish forward action with overlap enabled.

HuiGao-NV · 2025-06-19T00:17:56Z

/bot run

tensorrt-cicd · 2025-06-19T00:23:17Z

PR_Github #9423 [ run ] triggered by Bot

tensorrt_llm/_torch/pyexecutor/_util.py

tensorrt-cicd · 2025-06-19T05:30:36Z

PR_Github #9423 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6916 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

HuiGao-NV · 2025-06-19T11:13:22Z

/bot run

tensorrt-cicd · 2025-06-19T11:19:08Z

PR_Github #9502 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-19T12:48:13Z

PR_Github #9502 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6971 completed with status: 'FAILURE'

Signed-off-by: Hui Gao <[email protected]>

estimation Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV · 2025-06-20T01:56:15Z

Previous CI failed collected cases for RTX.
"[2025-06-19T11:50:47.135Z] ===================== 15797 deselected, 1 warning in 7.26s ====================="
Need to rerun.

HuiGao-NV · 2025-06-20T01:56:23Z

/bot run

tensorrt-cicd · 2025-06-20T02:01:54Z

PR_Github #9542 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-20T06:08:14Z

PR_Github #9542 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7002 completed with status: 'SUCCESS'

QiJune

LGTM

HuiGao-NV · 2025-06-24T03:30:00Z

/bot skip --comment="All CI cases have passed"

tensorrt-cicd · 2025-06-24T03:35:22Z

PR_Github #9633 [ skip ] triggered by Bot

tensorrt-cicd · 2025-06-24T03:43:41Z

PR_Github #9633 [ skip ] completed with state SUCCESS
Skipping testing for commit 863999a

HuiGao-NV requested review from a team as code owners June 19, 2025 00:17

HuiGao-NV requested review from schetlur-nv, pcastonguay and dongxuy04 June 19, 2025 00:17

QiJune requested a review from yweng0828 June 19, 2025 01:38

QiJune reviewed Jun 19, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Show resolved Hide resolved

QiJune reviewed Jun 19, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/_util.py Show resolved Hide resolved

HuiGao-NV changed the title ~~[fix] Add one to seq_len for overlap during memory estimation~~ [fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation Jun 19, 2025

yweng0828 approved these changes Jun 19, 2025

View reviewed changes

HuiGao-NV added 3 commits June 20, 2025 01:52

Add one to seq_len for overlap during memory estimation

00702a1

Signed-off-by: Hui Gao <[email protected]>

Remove file added by mistake

bb528cf

Signed-off-by: Hui Gao <[email protected]>

Add max_draft_tokens more tokens to kv cache seq_len for memory

d7cf3d9

estimation Signed-off-by: Hui Gao <[email protected]>

HuiGao-NV force-pushed the extra_token_for_overlap branch from 6be9345 to d7cf3d9 Compare June 20, 2025 01:53

HuiGao-NV requested a review from QiJune June 20, 2025 09:45

HuiGao-NV enabled auto-merge (squash) June 20, 2025 09:45

HuiGao-NV added 2 commits June 20, 2025 17:45

Merge branch 'main' into extra_token_for_overlap

0252a9b

Merge branch 'main' into extra_token_for_overlap

3a19409

HuiGao-NV requested a review from a team June 23, 2025 22:08

QiJune approved these changes Jun 24, 2025

View reviewed changes

Merge branch 'main' into extra_token_for_overlap

863999a

HuiGao-NV merged commit e16c1be into NVIDIA:main Jun 24, 2025
3 checks passed

HuiGao-NV deleted the extra_token_for_overlap branch June 25, 2025 13:27

[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation #5343

[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation #5343

Uh oh!

Conversation

HuiGao-NV commented Jun 19, 2025

Uh oh!

HuiGao-NV commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

HuiGao-NV commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

tensorrt-cicd commented Jun 19, 2025

Uh oh!

HuiGao-NV commented Jun 20, 2025

Uh oh!

HuiGao-NV commented Jun 20, 2025

Uh oh!

tensorrt-cicd commented Jun 20, 2025

Uh oh!

tensorrt-cicd commented Jun 20, 2025

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

HuiGao-NV commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

Uh oh!

Uh oh!