Skip to content

Support paged kv cache for benchmarks #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: release/0.5.0
Choose a base branch
from

Conversation

yunfeng-scale
Copy link

@yunfeng-scale yunfeng-scale commented Oct 26, 2023

This supports building a new engine with paged kv cache or loading existing engine built with KV cache. Otherwise there would be tensor name errors.

Listing this as draft since I think there's some issue with memory management in either benchmarks or generation that causing KV cache to be not freed and I'm getting steadily increasing memory usage and eventually OOM.

Edit: converting to ready to review since I think OOM is a separate issue from this pR #283

@yunfeng-scale yunfeng-scale marked this pull request as ready for review November 15, 2023 19:12
@poweiw poweiw requested a review from schetlur-nv May 16, 2025 21:31
@poweiw poweiw added KV-Cache Management kv-cache management for efficient LLM inference triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community labels May 16, 2025
@poweiw
Copy link
Collaborator

poweiw commented Jun 5, 2025

@yunfeng-scale is this PR still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute PRs initiated from Community KV-Cache Management kv-cache management for efficient LLM inference triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants