New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Do LLM Cache Support V100 hardware? #791

Open

jlcoo opened this issue Mar 4, 2025 · 1 comment

jlcoo commented Mar 4, 2025

I using V100 gpu to testing deploy Distributed KV Cache exmaple, unfortunately it's failed, because requires flash attention backend.

Collaborator

DwyaneShi commented Mar 4, 2025

@jlcoo Thanks for trying out the distributed kv cache offloading feature, we will support more attention backends soon, please stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment