Skip to content

Commit 380e018

Browse files
committed
started collecting benchmarks
- created simple smoke check benchmark script to benchmarks folder - started collecting benchmarks to directory under the benchmarks - purpose of these benchmarks is to be able to compare results after component version updates to catch regressions and improvements #63 Signed-off-by: Mika Laitio <[email protected]>
1 parent a2e6a3d commit 380e018

File tree

5 files changed

+101
-0
lines changed

5 files changed

+101
-0
lines changed

benchmarks/README.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# fast verify benchmark
2+
3+
- run_and_save_benchmarks.sh will execute 2
4+
relatively fast benchmarks to smoke check and collect results from simple apps
5+
- todo: add llama.cpp benchmark
6+
7+
# more demaning pytorch-gpu-benchmark
8+
- https://github.com/lamikr/pytorch-gpu-benchmark
9+
- please collect the results after execution from the
10+
new_results folder and create merge request to get them saved to git repository
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Benchmarking CPU and GPUs
2+
Pytorch version: 2.4.1-rc1
3+
ROCM HIP version: 6.1.40093-8099c494c
4+
Device: cpu-16
5+
'CPU time: 23.486 sec
6+
Device: AMD Radeon RX 7700S
7+
'GPU time: 0.199 sec
8+
Device: AMD Radeon 780M
9+
'GPU time: 0.191 sec
10+
Benchmark ready
11+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
Pytorch version: 2.4.1-rc1
2+
dot product calculation test
3+
tensor([[[ 0.0769, 1.4105, 0.0824, 0.5644, 0.5710, 0.8619, -0.0698,
4+
-0.1378],
5+
[-0.0206, 1.3138, -0.5070, 0.3971, 0.5620, 0.8419, -0.2367,
6+
0.0135],
7+
[-0.1797, 1.3761, 0.0258, 0.5147, 0.5673, 0.7445, -0.0543,
8+
-0.0028]],
9+
10+
[[-0.4074, 0.4956, 0.0553, -0.7740, -0.3718, 1.3344, 0.8070,
11+
-0.3321],
12+
[-0.5268, 0.5001, 0.0537, -0.6846, -0.3624, 1.1640, 0.6590,
13+
-0.2191],
14+
[-0.5697, 0.5082, 0.0254, -0.6951, -0.3435, 1.0934, 0.7012,
15+
-0.2850]]], device='cuda:0')
16+
17+
Benchmarking cuda and cpu with Default, Math, Flash Attention amd Memory pytorch backends
18+
Device: AMD Radeon RX 7700S
19+
Default cuda:0 benchmark:
20+
24404.260 microseconds, 0.02440425969834905 sec
21+
Math cuda:0 benchmark:
22+
71419.426 microseconds, 0.07141942633703972 sec
23+
Flash Attention cuda:0 benchmark:
24+
24076.089 microseconds, 0.02407608859939501 sec
25+
Memory Efficient cuda:0 benchmark:
26+
24541.843 microseconds, 0.024541843199403956 sec
27+
Device: cpu-16
28+
Default cpu benchmark:
29+
26995025.818 microseconds, 26.99502581800334 sec
30+
Math cpu benchmark:
31+
30105574.327 microseconds, 30.105574326997157 sec
32+
Flash Attention cpu benchmark:
33+
26501703.386 microseconds, 26.501703385991274 sec
34+
Memory Efficient cpu benchmark:
35+
Memory Efficient cpu is not supported. See warnings for reasons.
36+
Summary
37+
38+
Pytorch version: 2.4.1-rc1
39+
ROCM HIP version: 6.1.40093-8099c494c
40+
Device: AMD Radeon RX 7700S
41+
Default cuda:0: 24404.260 ms
42+
Math cuda:0: 71419.426 ms
43+
Flash Attention cuda:0: 24076.089 ms
44+
Memory Efficient cuda:0: 24541.843 ms
45+
46+
Device: cpu-16
47+
Default cpu: 26995025.818 ms
48+
Math cpu: 30105574.327 ms
49+
Flash Attention cpu: 26501703.386 ms
50+
Memory Efficient cpu: -1.000 ms
51+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
20240807_175108_pytorch_dot_products.txt
2+
- pytorch 2.4.1-rc1
3+
- amdsmi-fix
4+
- aotriton gfx110* series tuning data
5+
- latest deepspeed
6+
- bitsandbytes, triton and torch_migraphx update

benchmarks/run_and_save_benchmarks.sh

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#!/bin/bash
2+
3+
# rocm-sdk launcher for test application
4+
# if test fails on AMD GPU, enable AMD_LOG_LEVEL and HIP_VISIBLE_DEVICES=0 variables
5+
# to get traces to find the failing code part
6+
if [ -z $ROCM_HOME ]; then
7+
echo "Error, make sure that you have executed"
8+
echo " source /opt/rocm_sdk_612/bin/env_rocm.sh"
9+
echo "before running this script"
10+
exit 1
11+
fi
12+
#AMD_LOG_LEVEL=1 HIP_VISIBLE_DEVICES=0 HIP_LAUNCH_BLOCKING=1
13+
14+
DATE_STR=`date '+%Y%m%d_%H%M%S'`;
15+
echo "Timestamp for benchmark results: ${DATE_STR}"
16+
FN_PYTORCH_CPU_VS_GPU_SIMPLE_RES="${DATE_STR}_cpu_vs_gpu_simple.txt"
17+
FN_PYTORCH_DOT_PRODUCT_FLASH_RES="${DATE_STR}_pytorch_dot_products.txt"
18+
19+
echo "Saving to file: $FN_PYTORCH_CPU_VS_GPU_SIMPLE_RES"
20+
python ../docs/examples/pytorch/pytorch_cpu_vs_gpu_simple_benchmark.py > ${FN_PYTORCH_CPU_VS_GPU_SIMPLE_RES}
21+
22+
echo "Saving to file: $FN_PYTORCH_DOT_PRODUCT_FLASH_RES"
23+
python ../docs/examples/pytorch/flash_attention/flash_attention_dot_product_benchmark.py > ${FN_PYTORCH_DOT_PRODUCT_FLASH_RES}

0 commit comments

Comments
 (0)