Skip to content

Commit 0d21ab6

Browse files
committed
[Misc] Use collapsible blocks for benchmark examples.
Signed-off-by: reidliu41 <[email protected]>
1 parent 9a3b883 commit 0d21ab6

File tree

1 file changed

+59
-33
lines changed

1 file changed

+59
-33
lines changed

benchmarks/README.md

Lines changed: 59 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
44
datasets supported on vLLM. It’s a living document, updated as new features and datasets
55
become available.
66

7-
## Dataset Overview
7+
**Dataset Overview**
88

99
<table style="width:100%; border-collapse: collapse;">
1010
<thead>
@@ -82,7 +82,10 @@ become available.
8282
**Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
8383

8484
---
85-
## Example - Online Benchmark
85+
<details>
86+
<summary><b>🚀 Example - Online Benchmark</b></summary>
87+
88+
<br/>
8689

8790
First start serving your model
8891

@@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
130133
==================================================
131134
```
132135

133-
### Custom Dataset
136+
**Custom Dataset**
137+
134138
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
135139

136140
```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
162166

163167
You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
164168

165-
### VisionArena Benchmark for Vision Language Models
169+
**VisionArena Benchmark for Vision Language Models**
166170

167171
```bash
168172
# need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
180184
--num-prompts 1000
181185
```
182186

183-
### InstructCoder Benchmark with Speculative Decoding
187+
**InstructCoder Benchmark with Speculative Decoding**
184188

185189
``` bash
186190
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
197201
--num-prompts 2048
198202
```
199203

200-
### Other HuggingFaceDataset Examples
204+
**Other HuggingFaceDataset Examples**
201205

202206
```bash
203207
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
251255
--num-prompts 80
252256
```
253257

254-
### Running With Sampling Parameters
258+
**Running With Sampling Parameters**
255259

256260
When using OpenAI-compatible backends such as `vllm`, optional sampling
257261
parameters can be specified. Example client command:
@@ -269,8 +273,12 @@ python3 vllm/benchmarks/benchmark_serving.py \
269273
--num-prompts 10
270274
```
271275

272-
---
273-
## Example - Offline Throughput Benchmark
276+
</details>
277+
278+
<details>
279+
<summary><b>📈 Example - Offline Throughput Benchmark</b></summary>
280+
281+
<br/>
274282

275283
```bash
276284
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -288,7 +296,7 @@ Total num prompt tokens: 5014
288296
Total num output tokens: 1500
289297
```
290298

291-
### VisionArena Benchmark for Vision Language Models
299+
**VisionArena Benchmark for Vision Language Models**
292300

293301
``` bash
294302
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -308,7 +316,7 @@ Total num prompt tokens: 14527
308316
Total num output tokens: 1280
309317
```
310318

311-
### InstructCoder Benchmark with Speculative Decoding
319+
**InstructCoder Benchmark with Speculative Decoding**
312320

313321
``` bash
314322
VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -332,7 +340,7 @@ Total num prompt tokens: 261136
332340
Total num output tokens: 204800
333341
```
334342

335-
### Other HuggingFaceDataset Examples
343+
**Other HuggingFaceDataset Examples**
336344

337345
**`lmms-lab/LLaVA-OneVision-Data`**
338346

@@ -371,7 +379,7 @@ python3 benchmarks/benchmark_throughput.py \
371379
--num-prompts 10
372380
```
373381

374-
### Benchmark with LoRA Adapters
382+
**Benchmark with LoRA Adapters**
375383

376384
``` bash
377385
# download dataset
@@ -388,18 +396,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
388396
--lora-path yard1/llama-2-7b-sql-lora-test
389397
```
390398

391-
---
392-
## Example - Structured Output Benchmark
399+
</details>
400+
401+
<details>
402+
<summary><b>🛠️ Example - Structured Output Benchmark</b></summary>
403+
404+
<br/>
393405

394406
Benchmark the performance of structured output generation (JSON, grammar, regex).
395407

396-
### Server Setup
408+
**Server Setup**
397409

398410
```bash
399411
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
400412
```
401413

402-
### JSON Schema Benchmark
414+
**JSON Schema Benchmark**
403415

404416
```bash
405417
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -411,7 +423,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
411423
--num-prompts 1000
412424
```
413425

414-
### Grammar-based Generation Benchmark
426+
**Grammar-based Generation Benchmark**
415427

416428
```bash
417429
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -423,7 +435,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
423435
--num-prompts 1000
424436
```
425437

426-
### Regex-based Generation Benchmark
438+
**Regex-based Generation Benchmark**
427439

428440
```bash
429441
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -434,7 +446,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
434446
--num-prompts 1000
435447
```
436448

437-
### Choice-based Generation Benchmark
449+
**Choice-based Generation Benchmark**
438450

439451
```bash
440452
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -445,7 +457,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
445457
--num-prompts 1000
446458
```
447459

448-
### XGrammar Benchmark Dataset
460+
**XGrammar Benchmark Dataset**
449461

450462
```bash
451463
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -456,12 +468,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
456468
--num-prompts 1000
457469
```
458470

459-
---
460-
## Example - Long Document QA Throughput Benchmark
471+
</details>
472+
473+
<details>
474+
<summary><b>📚 Example - Long Document QA Benchmark</b></summary>
475+
476+
<br/>
461477

462478
Benchmark the performance of long document question-answering with prefix caching.
463479

464-
### Basic Long Document QA Test
480+
**Basic Long Document QA Test**
465481

466482
```bash
467483
python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -473,7 +489,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
473489
--repeat-count 5
474490
```
475491

476-
### Different Repeat Modes
492+
**Different Repeat Modes**
477493

478494
```bash
479495
# Random mode (default) - shuffle prompts randomly
@@ -504,12 +520,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
504520
--repeat-mode interleave
505521
```
506522

507-
---
508-
## Example - Prefix Caching Benchmark
523+
</details>
524+
525+
<details>
526+
<summary><b>🗂️ Example - Prefix Caching Benchmark</b></summary>
527+
528+
<br/>
509529

510530
Benchmark the efficiency of automatic prefix caching.
511531

512-
### Fixed Prompt with Prefix Caching
532+
**Fixed Prompt with Prefix Caching**
513533

514534
```bash
515535
python3 benchmarks/benchmark_prefix_caching.py \
@@ -520,7 +540,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
520540
--input-length-range 128:256
521541
```
522542

523-
### ShareGPT Dataset with Prefix Caching
543+
**ShareGPT Dataset with Prefix Caching**
524544

525545
```bash
526546
# download dataset
@@ -535,12 +555,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
535555
--input-length-range 128:256
536556
```
537557

538-
---
539-
## Example - Request Prioritization Benchmark
558+
</details>
559+
560+
<details>
561+
<summary><b>⚡ Example - Request Prioritization Benchmark</b></summary>
562+
563+
<br/>
540564

541565
Benchmark the performance of request prioritization in vLLM.
542566

543-
### Basic Prioritization Test
567+
**Basic Prioritization Test**
544568

545569
```bash
546570
python3 benchmarks/benchmark_prioritization.py \
@@ -551,7 +575,7 @@ python3 benchmarks/benchmark_prioritization.py \
551575
--scheduling-policy priority
552576
```
553577

554-
### Multiple Sequences per Prompt
578+
**Multiple Sequences per Prompt**
555579

556580
```bash
557581
python3 benchmarks/benchmark_prioritization.py \
@@ -562,3 +586,5 @@ python3 benchmarks/benchmark_prioritization.py \
562586
--scheduling-policy priority \
563587
--n 2
564588
```
589+
590+
</details>

0 commit comments

Comments
 (0)