FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

yunzhongyan0 · 2024-01-18T04:16:28Z

make qwen model support use_paged_context_fmha feature. At the same time, enable_kv_cache can be used in triton server

…_reuse in triton

poweiw · 2025-05-27T23:06:48Z

Hello @yunzhongyan0 ! Can you check with the latest TRTLLM version and see if the bug is still relevant??

FEA: support use_paged_context_fmha feature to enable enable_kv_cache…

7ec4635

…_reuse in triton

poweiw added triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community KV-Cache Management kv-cache management for efficient LLM inference labels May 27, 2025

poweiw assigned thorjohnsen May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

Uh oh!

yunzhongyan0 commented Jan 18, 2024

Uh oh!

poweiw commented May 27, 2025

Uh oh!

Uh oh!

FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

Are you sure you want to change the base?

FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

Uh oh!

Conversation

yunzhongyan0 commented Jan 18, 2024

Uh oh!

poweiw commented May 27, 2025

Uh oh!

Uh oh!