[Misc] Use collapsible blocks for benchmark examples.

reidliu41 · reidliu41 · commit 0d21ab652259 · 2025-06-24T18:31:16.000+08:00
Signed-off-by: reidliu41 &lt;reid201711@gmail.com&gt;
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
 datasets supported on vLLM. It’s a living document, updated as new features and datasets
 become available.
 
-## Dataset Overview
+**Dataset Overview**
 
 <table style="width:100%; border-collapse: collapse;">
   <thead>
@@ -82,7 +82,10 @@ become available.
 **Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
 
 ---
-## Example - Online Benchmark
+<details>
+<summary><b>🚀 Example - Online Benchmark</b></summary>
+
+<br/>
 
 First start serving your model
 
@@ -130,7 +133,8 @@ P99 ITL (ms):                            8.39
 ==================================================
 ```
 
-### Custom Dataset
+**Custom Dataset**
+
 If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
 
 ```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
 
 You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
 
-### VisionArena Benchmark for Vision Language Models
+**VisionArena Benchmark for Vision Language Models**
 
 ```bash
 # need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
   --num-prompts 1000
 ```
 
-### InstructCoder Benchmark with Speculative Decoding
+**InstructCoder Benchmark with Speculative Decoding**
 
 ``` bash
 VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
     --num-prompts 2048
 ```
 
-### Other HuggingFaceDataset Examples
+**Other HuggingFaceDataset Examples**
 
 ```bash
 vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
     --num-prompts 80
 ```
 
-### Running With Sampling Parameters
+**Running With Sampling Parameters**
 
 When using OpenAI-compatible backends such as `vllm`, optional sampling
 parameters can be specified. Example client command:
@@ -269,8 +273,12 @@ python3 vllm/benchmarks/benchmark_serving.py \
   --num-prompts 10
 ```
 
----
-## Example - Offline Throughput Benchmark
+</details>
+
+<details>
+<summary><b>📈 Example - Offline Throughput Benchmark</b></summary>
+
+<br/>
 
 ```bash
 python3 vllm/benchmarks/benchmark_throughput.py \
@@ -288,7 +296,7 @@ Total num prompt tokens:  5014
 Total num output tokens:  1500
 ```
 
-### VisionArena Benchmark for Vision Language Models
+**VisionArena Benchmark for Vision Language Models**
 
 ``` bash
 python3 vllm/benchmarks/benchmark_throughput.py \
@@ -308,7 +316,7 @@ Total num prompt tokens:  14527
 Total num output tokens:  1280
 ```
 
-### InstructCoder Benchmark with Speculative Decoding
+**InstructCoder Benchmark with Speculative Decoding**
 
 ``` bash
 VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -332,7 +340,7 @@ Total num prompt tokens:  261136
 Total num output tokens:  204800
 ```
 
-### Other HuggingFaceDataset Examples
+**Other HuggingFaceDataset Examples**
 
 **`lmms-lab/LLaVA-OneVision-Data`**
 
@@ -371,7 +379,7 @@ python3 benchmarks/benchmark_throughput.py \
   --num-prompts 10
 ```
 
-### Benchmark with LoRA Adapters
+**Benchmark with LoRA Adapters**
 
 ``` bash
 # download dataset
@@ -388,18 +396,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
   --lora-path yard1/llama-2-7b-sql-lora-test
   ```
 
----
-## Example - Structured Output Benchmark
+</details>
+
+<details>
+<summary><b>🛠️ Example - Structured Output Benchmark</b></summary>
+
+<br/>
 
 Benchmark the performance of structured output generation (JSON, grammar, regex).
 
-### Server Setup
+**Server Setup**
 
 ```bash
 vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
 ```
 
-### JSON Schema Benchmark
+**JSON Schema Benchmark**
 
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -411,7 +423,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
   --num-prompts 1000
 ```
 
-### Grammar-based Generation Benchmark
+**Grammar-based Generation Benchmark**
 
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -423,7 +435,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
   --num-prompts 1000
 ```
 
-### Regex-based Generation Benchmark
+**Regex-based Generation Benchmark**
 
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -434,7 +446,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
   --num-prompts 1000
 ```
 
-### Choice-based Generation Benchmark
+**Choice-based Generation Benchmark**
 
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -445,7 +457,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
   --num-prompts 1000
 ```
 
-### XGrammar Benchmark Dataset
+**XGrammar Benchmark Dataset**
 
 ```bash
 python3 benchmarks/benchmark_serving_structured_output.py \
@@ -456,12 +468,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
   --num-prompts 1000
 ```
 
----
-## Example - Long Document QA Throughput Benchmark
+</details>
+
+<details>
+<summary><b>📚 Example - Long Document QA Benchmark</b></summary>
+
+<br/>
 
 Benchmark the performance of long document question-answering with prefix caching.
 
-### Basic Long Document QA Test
+**Basic Long Document QA Test**
 
 ```bash
 python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -473,7 +489,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
   --repeat-count 5
 ```
 
-### Different Repeat Modes
+**Different Repeat Modes**
 
 ```bash
 # Random mode (default) - shuffle prompts randomly
@@ -504,12 +520,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
   --repeat-mode interleave
 ```
 
----
-## Example - Prefix Caching Benchmark
+</details>
+
+<details>
+<summary><b>🗂️ Example - Prefix Caching Benchmark</b></summary>
+
+<br/>
 
 Benchmark the efficiency of automatic prefix caching.
 
-### Fixed Prompt with Prefix Caching
+**Fixed Prompt with Prefix Caching**
 
 ```bash
 python3 benchmarks/benchmark_prefix_caching.py \
@@ -520,7 +540,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
   --input-length-range 128:256
 ```
 
-### ShareGPT Dataset with Prefix Caching
+**ShareGPT Dataset with Prefix Caching**
 
 ```bash
 # download dataset
@@ -535,12 +555,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
   --input-length-range 128:256
 ```
 
----
-## Example - Request Prioritization Benchmark
+</details>
+
+<details>
+<summary><b>⚡ Example - Request Prioritization Benchmark</b></summary>
+
+<br/>
 
 Benchmark the performance of request prioritization in vLLM.
 
-### Basic Prioritization Test
+**Basic Prioritization Test**
 
 ```bash
 python3 benchmarks/benchmark_prioritization.py \
@@ -551,7 +575,7 @@ python3 benchmarks/benchmark_prioritization.py \
   --scheduling-policy priority
 ```
 
-### Multiple Sequences per Prompt
+**Multiple Sequences per Prompt**
 
 ```bash
 python3 benchmarks/benchmark_prioritization.py \
@@ -562,3 +586,5 @@ python3 benchmarks/benchmark_prioritization.py \
   --scheduling-policy priority \
   --n 2
 ```
+
+</details>