Skip to content

FEA: support use_paged_context_fmha feature to enable enable_kv_cache… #909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yunzhongyan0
Copy link

make qwen model support use_paged_context_fmha feature. At the same time, enable_kv_cache can be used in triton server

@poweiw poweiw added triaged Issue has been triaged by maintainers Community want to contribute PRs initiated from Community KV-Cache Management kv-cache management for efficient LLM inference labels May 27, 2025
@poweiw
Copy link
Collaborator

poweiw commented May 27, 2025

Hello @yunzhongyan0 ! Can you check with the latest TRTLLM version and see if the bug is still relevant??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute PRs initiated from Community KV-Cache Management kv-cache management for efficient LLM inference triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants