Skip to content

[fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation #5343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 24, 2025

Conversation

HuiGao-NV
Copy link
Collaborator

When estimate memory consumption, we need to a tmp kv cache. We need to leave one more token space to finish forward action with overlap enabled.

@HuiGao-NV HuiGao-NV requested review from a team as code owners June 19, 2025 00:17
@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9423 [ run ] triggered by Bot

@QiJune QiJune requested a review from yweng0828 June 19, 2025 01:38
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9423 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6916 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9502 [ run ] triggered by Bot

@HuiGao-NV HuiGao-NV changed the title [fix] Add one to seq_len for overlap during memory estimation [fix] Add 1 and draft_token_num to seq_len when overlap scheduling is enabled during memory estimation Jun 19, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #9502 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6971 completed with status: 'FAILURE'

@HuiGao-NV HuiGao-NV force-pushed the extra_token_for_overlap branch from 6be9345 to d7cf3d9 Compare June 20, 2025 01:53
@HuiGao-NV
Copy link
Collaborator Author

Previous CI failed collected cases for RTX.
"[2025-06-19T11:50:47.135Z] ===================== 15797 deselected, 1 warning in 7.26s ====================="
Need to rerun.

@HuiGao-NV
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9542 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9542 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7002 completed with status: 'SUCCESS'

@HuiGao-NV HuiGao-NV requested a review from QiJune June 20, 2025 09:45
@HuiGao-NV HuiGao-NV enabled auto-merge (squash) June 20, 2025 09:45
@HuiGao-NV HuiGao-NV requested a review from a team June 23, 2025 22:08
Copy link
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuiGao-NV
Copy link
Collaborator Author

/bot skip --comment="All CI cases have passed"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9633 [ skip ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9633 [ skip ] completed with state SUCCESS
Skipping testing for commit 863999a

@HuiGao-NV HuiGao-NV merged commit e16c1be into NVIDIA:main Jun 24, 2025
3 checks passed
@HuiGao-NV HuiGao-NV deleted the extra_token_for_overlap branch June 25, 2025 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants