Skip to content

Commit 16c8185

Browse files
Updated README.md with config info and header font size (#473)
* Updated README.md with config info and header font size * Changed AITER example to in 128 out 2048 bs 256 * Reorg whats new added changelog --------- Co-authored-by: arakowsk-amd <[email protected]>
1 parent 0f2300e commit 16c8185

File tree

1 file changed

+29
-19
lines changed

1 file changed

+29
-19
lines changed

docs/dev-docker/README.md

+29-19
Original file line numberDiff line numberDiff line change
@@ -21,26 +21,12 @@ Pull the most recent validated docker image with `docker pull rocm/vllm-dev:main
2121

2222
## What is New
2323

24-
20250305_aiter:
25-
- AITER improvements
24+
- [Experimental AITER support](#aiter-use-cases)
25+
- [Experimental DeepSeek-V3 and DeepSeek-R1 support](#running-deepseek-v3-and-deepseek-r1)
26+
- Performance improvement for custom paged attention
2627
- Support for FP8 skinny GEMM
27-
28-
20250207_aiter:
29-
- More performant AITER
3028
- Bug fixes
3129

32-
20250205_aiter:
33-
- [AITER](https://github.com/ROCm/aiter) support
34-
- Performance improvement for custom paged attention
35-
- Reduced memory overhead bug fix
36-
37-
20250124:
38-
- Fix accuracy issue with 405B FP8 Triton FA
39-
- Fixed accuracy issue with TP8
40-
41-
20250117:
42-
- [Experimental DeepSeek-V3 and DeepSeek-R1 support](#running-deepseek-v3-and-deepseek-r1)
43-
4430
## Performance Results
4531

4632
The data in the following tables is a reference point to help users validate observed performance. It should not be considered as the peak performance that can be delivered by AMD Instinct™ MI300X accelerator with vLLM. See the MLPerf section in this document for information about MLPerf 4.1 inference results. The performance numbers above were collected using the steps below.
@@ -62,7 +48,7 @@ The table below shows performance data where a local inference client is fed req
6248

6349
*TP stands for Tensor Parallelism.*
6450

65-
## Latency Measurements
51+
### Latency Measurements
6652

6753
The table below shows latency measurement, which typically involves assessing the time from when the system receives an input to when the model produces a result.
6854

@@ -103,6 +89,8 @@ The table below shows latency measurement, which typically involves assessing th
10389

10490
*TP stands for Tensor Parallelism.*
10591

92+
Supermicro AS-8125GS-TNMR2 with 2x AMD EPYC 9554 Processors, 2.25 TiB RAM, 8x AMD Instinct MI300X (192GiB, 750W) GPUs, Ubuntu 22.04, and amdgpu driver 6.8.5
93+
10694
## Reproducing Benchmarked Results
10795

10896
### Preparation - Obtaining access to models
@@ -459,7 +447,7 @@ Some use cases include:
459447

460448
```bash
461449
export VLLM_USE_AITER=1
462-
python3 /app/vllm/benchmarks/benchmark_latency.py --model amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV -tp 8 --batch-size 256 --input-len 1024 --output-len 128
450+
python3 /app/vllm/benchmarks/benchmark_latency.py --model amd/Mixtral-8x22B-Instruct-v0.1-FP8-KV -tp 8 --batch-size 256 --input-len 128 --output-len 2048
463451
```
464452

465453
## MMLU_PRO_Biology Accuracy Evaluation
@@ -509,3 +497,25 @@ Use AITER release candidate branch instead:
509497
git checkout aiter_integration_final
510498
docker build -f Dockerfile.rocm -t <your_tag> --build-arg USE_CYTHON=1 .
511499
```
500+
501+
## Changelog
502+
503+
20250305_aiter:
504+
- AITER improvements
505+
- Support for FP8 skinny GEMM
506+
507+
20250207_aiter:
508+
- More performant AITER
509+
- Bug fixes
510+
511+
20250205_aiter:
512+
- [AITER](https://github.com/ROCm/aiter) support
513+
- Performance improvement for custom paged attention
514+
- Reduced memory overhead bug fix
515+
516+
20250124:
517+
- Fix accuracy issue with 405B FP8 Triton FA
518+
- Fixed accuracy issue with TP8
519+
520+
20250117:
521+
- [Experimental DeepSeek-V3 and DeepSeek-R1 support](#running-deepseek-v3-and-deepseek-r1)

0 commit comments

Comments
 (0)