@@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
4
4
datasets supported on vLLM. It’s a living document, updated as new features and datasets
5
5
become available.
6
6
7
- ## Dataset Overview
7
+ ** Dataset Overview**
8
8
9
9
<table style =" width :100% ; border-collapse : collapse ;" >
10
10
<thead >
@@ -82,7 +82,10 @@ become available.
82
82
** Note** : HuggingFace dataset's ` dataset-name ` should be set to ` hf `
83
83
84
84
---
85
- ## Example - Online Benchmark
85
+ <details >
86
+ <summary ><b >🚀 Example - Online Benchmark</b ></summary >
87
+
88
+ <br />
86
89
87
90
First start serving your model
88
91
@@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
130
133
==================================================
131
134
```
132
135
133
- ### Custom Dataset
136
+ ** Custom Dataset**
137
+
134
138
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using ` CustomDataset ` . Your data needs to be in ` .jsonl ` format and needs to have "prompt" field per entry, e.g., data.jsonl
135
139
136
140
```
@@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
162
166
163
167
You can skip applying chat template if your data already has it by using ` --custom-skip-chat-template ` .
164
168
165
- ### VisionArena Benchmark for Vision Language Models
169
+ ** VisionArena Benchmark for Vision Language Models**
166
170
167
171
``` bash
168
172
# need a model with vision capability here
@@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
180
184
--num-prompts 1000
181
185
```
182
186
183
- ### InstructCoder Benchmark with Speculative Decoding
187
+ ** InstructCoder Benchmark with Speculative Decoding**
184
188
185
189
``` bash
186
190
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
197
201
--num-prompts 2048
198
202
```
199
203
200
- ### Other HuggingFaceDataset Examples
204
+ ** Other HuggingFaceDataset Examples**
201
205
202
206
``` bash
203
207
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
251
255
--num-prompts 80
252
256
```
253
257
254
- ### Running With Sampling Parameters
258
+ ** Running With Sampling Parameters**
255
259
256
260
When using OpenAI-compatible backends such as ` vllm ` , optional sampling
257
261
parameters can be specified. Example client command:
@@ -269,8 +273,12 @@ python3 vllm/benchmarks/benchmark_serving.py \
269
273
--num-prompts 10
270
274
```
271
275
272
- ---
273
- ## Example - Offline Throughput Benchmark
276
+ </details >
277
+
278
+ <details >
279
+ <summary ><b >📈 Example - Offline Throughput Benchmark</b ></summary >
280
+
281
+ <br />
274
282
275
283
``` bash
276
284
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -288,7 +296,7 @@ Total num prompt tokens: 5014
288
296
Total num output tokens: 1500
289
297
```
290
298
291
- ### VisionArena Benchmark for Vision Language Models
299
+ ** VisionArena Benchmark for Vision Language Models**
292
300
293
301
``` bash
294
302
python3 vllm/benchmarks/benchmark_throughput.py \
@@ -308,7 +316,7 @@ Total num prompt tokens: 14527
308
316
Total num output tokens: 1280
309
317
```
310
318
311
- ### InstructCoder Benchmark with Speculative Decoding
319
+ ** InstructCoder Benchmark with Speculative Decoding**
312
320
313
321
``` bash
314
322
VLLM_WORKER_MULTIPROC_METHOD=spawn \
@@ -332,7 +340,7 @@ Total num prompt tokens: 261136
332
340
Total num output tokens: 204800
333
341
```
334
342
335
- ### Other HuggingFaceDataset Examples
343
+ ** Other HuggingFaceDataset Examples**
336
344
337
345
** ` lmms-lab/LLaVA-OneVision-Data ` **
338
346
@@ -371,7 +379,7 @@ python3 benchmarks/benchmark_throughput.py \
371
379
--num-prompts 10
372
380
```
373
381
374
- ### Benchmark with LoRA Adapters
382
+ ** Benchmark with LoRA Adapters**
375
383
376
384
``` bash
377
385
# download dataset
@@ -388,18 +396,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
388
396
--lora-path yard1/llama-2-7b-sql-lora-test
389
397
```
390
398
391
- ---
392
- ## Example - Structured Output Benchmark
399
+ </details >
400
+
401
+ <details >
402
+ <summary ><b >🛠️ Example - Structured Output Benchmark</b ></summary >
403
+
404
+ <br />
393
405
394
406
Benchmark the performance of structured output generation (JSON, grammar, regex).
395
407
396
- ### Server Setup
408
+ ** Server Setup**
397
409
398
410
``` bash
399
411
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
400
412
```
401
413
402
- ### JSON Schema Benchmark
414
+ ** JSON Schema Benchmark**
403
415
404
416
``` bash
405
417
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -411,7 +423,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
411
423
--num-prompts 1000
412
424
```
413
425
414
- ### Grammar-based Generation Benchmark
426
+ ** Grammar-based Generation Benchmark**
415
427
416
428
``` bash
417
429
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -423,7 +435,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
423
435
--num-prompts 1000
424
436
```
425
437
426
- ### Regex-based Generation Benchmark
438
+ ** Regex-based Generation Benchmark**
427
439
428
440
``` bash
429
441
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -434,7 +446,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
434
446
--num-prompts 1000
435
447
```
436
448
437
- ### Choice-based Generation Benchmark
449
+ ** Choice-based Generation Benchmark**
438
450
439
451
``` bash
440
452
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -445,7 +457,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
445
457
--num-prompts 1000
446
458
```
447
459
448
- ### XGrammar Benchmark Dataset
460
+ ** XGrammar Benchmark Dataset**
449
461
450
462
``` bash
451
463
python3 benchmarks/benchmark_serving_structured_output.py \
@@ -456,12 +468,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
456
468
--num-prompts 1000
457
469
```
458
470
459
- ---
460
- ## Example - Long Document QA Throughput Benchmark
471
+ </details >
472
+
473
+ <details >
474
+ <summary ><b >📚 Example - Long Document QA Benchmark</b ></summary >
475
+
476
+ <br />
461
477
462
478
Benchmark the performance of long document question-answering with prefix caching.
463
479
464
- ### Basic Long Document QA Test
480
+ ** Basic Long Document QA Test**
465
481
466
482
``` bash
467
483
python3 benchmarks/benchmark_long_document_qa_throughput.py \
@@ -473,7 +489,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
473
489
--repeat-count 5
474
490
```
475
491
476
- ### Different Repeat Modes
492
+ ** Different Repeat Modes**
477
493
478
494
``` bash
479
495
# Random mode (default) - shuffle prompts randomly
@@ -504,12 +520,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
504
520
--repeat-mode interleave
505
521
```
506
522
507
- ---
508
- ## Example - Prefix Caching Benchmark
523
+ </details >
524
+
525
+ <details >
526
+ <summary ><b >🗂️ Example - Prefix Caching Benchmark</b ></summary >
527
+
528
+ <br />
509
529
510
530
Benchmark the efficiency of automatic prefix caching.
511
531
512
- ### Fixed Prompt with Prefix Caching
532
+ ** Fixed Prompt with Prefix Caching**
513
533
514
534
``` bash
515
535
python3 benchmarks/benchmark_prefix_caching.py \
@@ -520,7 +540,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
520
540
--input-length-range 128:256
521
541
```
522
542
523
- ### ShareGPT Dataset with Prefix Caching
543
+ ** ShareGPT Dataset with Prefix Caching**
524
544
525
545
``` bash
526
546
# download dataset
@@ -535,12 +555,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
535
555
--input-length-range 128:256
536
556
```
537
557
538
- ---
539
- ## Example - Request Prioritization Benchmark
558
+ </details >
559
+
560
+ <details >
561
+ <summary ><b >⚡ Example - Request Prioritization Benchmark</b ></summary >
562
+
563
+ <br />
540
564
541
565
Benchmark the performance of request prioritization in vLLM.
542
566
543
- ### Basic Prioritization Test
567
+ ** Basic Prioritization Test**
544
568
545
569
``` bash
546
570
python3 benchmarks/benchmark_prioritization.py \
@@ -551,7 +575,7 @@ python3 benchmarks/benchmark_prioritization.py \
551
575
--scheduling-policy priority
552
576
```
553
577
554
- ### Multiple Sequences per Prompt
578
+ ** Multiple Sequences per Prompt**
555
579
556
580
``` bash
557
581
python3 benchmarks/benchmark_prioritization.py \
@@ -562,3 +586,5 @@ python3 benchmarks/benchmark_prioritization.py \
562
586
--scheduling-policy priority \
563
587
--n 2
564
588
```
589
+
590
+ </details >
0 commit comments