Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cache Offload] Improve radix cache offload benchmark #2534

Draft
wants to merge 15 commits into
base: xiezhq-hierarchical
Choose a base branch
from

Conversation

Edenzzzz
Copy link

@Edenzzzz Edenzzzz commented Dec 20, 2024

Motivation

  1. Improve bench sentence sampling and arg parsing;
  2. add README with benchmark instructions
    cc @xiezhq-hermann

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@Edenzzzz Edenzzzz changed the title Improve bench sentence sampling and arg parsing; add README for benchark instructions Improve radix cache offload benchmark Dec 20, 2024
@Edenzzzz Edenzzzz changed the title Improve radix cache offload benchmark [Cache Offload] Improve radix cache offload benchmark Dec 21, 2024
@Edenzzzz Edenzzzz marked this pull request as draft December 22, 2024 02:12
@Edenzzzz
Copy link
Author

Edenzzzz commented Dec 28, 2024

Currently offload seem to decreas the performance on H100.

  • context len = 3000
  • num groups = 100
  • 100 sentences per group
  • Command:
python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--host 0.0.0.0 --tensor-parallel-size 4 --enable-hierarchical-cache

Sample from NLTK

  • Offload on: 191.94 req/s, hit rate ~79%
  • Offload off: 230.87 req/s, hit rate 79.11%

Random Latin text from lorem (original benchmark)

  • Offload on: 141.44 req/s, hit rate ~79%
  • Offload off: 142.67 req/s, hit rate 79%

@xiezhq-hermann xiezhq-hermann force-pushed the xiezhq-hierarchical branch 3 times, most recently from 2bd500a to 1853cf2 Compare January 2, 2025 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants