Skip to content

Collection of pytorch gpu benchmark results #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lamikr opened this issue Jun 7, 2024 · 9 comments
Closed

Collection of pytorch gpu benchmark results #63

lamikr opened this issue Jun 7, 2024 · 9 comments

Comments

@lamikr
Copy link
Owner

lamikr commented Jun 7, 2024

Extensive GPU benchmarks with AMD gpus can now be run by following steps after building the rocm sdk. This version has now been synced with the upstream version which has fixed the pytorch 2.0 support in another way I had done earlier and it runs all the tests without running exceptions.

git clone https://github.com/lamikr/pytorch-gpu-benchmark
cd pytorch-gpu-benchmark
source /opt/rocm_sdk_611/bin/env_rocm.sh
./test.sh

It would be nice to collect results from different computers and create some comparison graphs.
On my AMD RX 6800 test execution was about 50 minutes and results were saved to
result-folder to following 8 files.

'AMD Radeon RX 6800_1_gpus__double_model_inference_benchmark.csv'  'AMD Radeon RX 6800_1_gpus__half_model_inference_benchmark.csv'
'AMD Radeon RX 6800_1_gpus__double_model_train_benchmark.csv'      'AMD Radeon RX 6800_1_gpus__half_model_train_benchmark.csv'
'AMD Radeon RX 6800_1_gpus__float_model_inference_benchmark.csv'    config.json
'AMD Radeon RX 6800_1_gpus__float_model_train_benchmark.csv'        system_info.txt

I have stored those from my benchmark run now to results/AMD_Radeon_RX_6800 folder of gpu benchmark.

So if you have done the tests, can you send them as a pull requests. At the moment the plot.ipynb code which should read the CSV files and generate pictures seems to be broken so that needs to be fixed...

@eitch
Copy link

eitch commented Jun 25, 2024

Hi @lamikr I've added my tests: ryujaehun/pytorch-gpu-benchmark#31

Should i send the PR to your fork?

@eitch
Copy link

eitch commented Jun 25, 2024

I now also sent the PR to your fork. I sure hope someone can fix the plotting.

@lamikr
Copy link
Owner Author

lamikr commented Jun 27, 2024

Thanks eitch, I noticed your 7900 xtx benchmarks today and merged the results to that repo.
I am not sure is the upstream of that benchmark anymore active.

Another tests I run quite often to check is vit example from this repo
https://github.com/BrianPulfer/PapersReimplementations.git (dir src/cv/vit)
and docs in
https://medium.com/@brianpulfer/vision-transformers-from-scratch-pytorch-a-step-by-step-guide-96c3313c2e0c

Then this one should contain all kind of usefull things

https://github.com/ROCm/ROCmValidationSuite/blob/master/docs/ug1main.md

@lamikr
Copy link
Owner Author

lamikr commented Jun 27, 2024

I have not integrated the tensorflow back, it's debugging is so time consuming if something goes wrong as it always like to trigger rebuild of everything without allowing to just fix a thing and continue.

But part of the tensorflow is a tool called tensorboard which is pretty nice. I just tested to install it with "pip install tensorboard" and it seemed to work ok without not messing the python dependencies.
So that's one alternative and maybe at some point the tensorflow could be added back to rocm sdk builder.

lamikr added a commit to lamikr/pytorch-gpu-benchmark that referenced this issue Jul 21, 2024
- allow specifying gpu-index parameter in addition
  of gpu-count parameter.
- gpu index parameter can be used to request the benchmarks
  to be run only of certain gpu index in multi-gpu case
- if more than one gpu, run benchmarks separately for each
  and then in the end run tests with all gpus used at a same time
- fixes for: lamikr/rocm_sdk_builder#63

Signed-off-by: Mika Laitio <[email protected]>
@lamikr
Copy link
Owner Author

lamikr commented Jul 21, 2024

pytorch gpu benchmark can now run tests for each gpu separately and/or using all gpus togetger in multi-gpu systems.
(See now -i option in addition of -g option)

@lamikr
Copy link
Owner Author

lamikr commented Aug 3, 2024

I wrote small python plotter to show the benchmark results... Non suprisingly the 7900 XTX from @eitch leads...

pytorch_gpu_benchmark

@lamikr
Copy link
Owner Author

lamikr commented Aug 3, 2024

Benchmark result pictures can now be generated pretty easil modifying the list of files that are selected to benchmark in

plot_benchmark_results.py

_result_filename_arr = [
                "results/AMD/AMD_Radeon_RX_7700S/AMD_Radeon_RX_7700S__half_model_train_benchmark.csv",
                "results/Nvidia/GeForce_GTX_1080_TI/GeForce_GTX_1080_TI__half_model_training_benchmark.csv",

And then running them by launching the

./show_benchmark_results.sh

https://github.com/lamikr/pytorch-gpu-benchmark/

@lamikr
Copy link
Owner Author

lamikr commented Aug 8, 2024

I added simple smokecheck benchmarks to benchmark directory. Will try to use these first to detect whether
the aotriton update to version with gfx11* series flashattention update and gfx11-series gpu tuning data speedups things

lamikr added a commit that referenced this issue Aug 8, 2024
- created simple smoke check benchmark
  script to benchmarks folder
- started collecting benchmarks to
  directory under the benchmarks
- purpose of these benchmarks is
  to be able to compare results after
  component version updates to catch
  regressions and improvements

#63

Signed-off-by: Mika Laitio <[email protected]>
@lamikr
Copy link
Owner Author

lamikr commented Oct 9, 2024

Benchmark is now mentioned well with graph included on README.md. Closing this now.

@lamikr lamikr closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants