Support paged kv cache for benchmarks #130

yunfeng-scale · 2023-10-26T04:50:02Z

This supports building a new engine with paged kv cache or loading existing engine built with KV cache. Otherwise there would be tensor name errors.

Listing this as draft since I think there's some issue with memory management in either benchmarks or generation that causing KV cache to be not freed and I'm getting steadily increasing memory usage and eventually OOM.

Edit: converting to ready to review since I think OOM is a separate issue from this pR #283

poweiw · 2025-06-05T21:44:12Z

@yunfeng-scale is this PR still relevant?

Support paged kv cache for benchmarks

c5143ba

yunfeng-scale marked this pull request as ready for review November 15, 2023 19:12

poweiw requested a review from schetlur-nv May 16, 2025 21:31

poweiw added KV-Cache Management kv-cache management for efficient LLM inference triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community labels May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support paged kv cache for benchmarks #130

Support paged kv cache for benchmarks #130

Uh oh!

yunfeng-scale commented Oct 26, 2023 •

edited

Loading

Uh oh!

poweiw commented Jun 5, 2025

Uh oh!

Uh oh!

Support paged kv cache for benchmarks #130

Are you sure you want to change the base?

Support paged kv cache for benchmarks #130

Uh oh!

Conversation

yunfeng-scale commented Oct 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

poweiw commented Jun 5, 2025

Uh oh!

Uh oh!

yunfeng-scale commented Oct 26, 2023 •

edited

Loading