ryujaehun · lamikr · Jun 7, 2024 · Jun 7, 2024 · Jun 25, 2024 · Jul 21, 2024
diff --git a/README.md b/README.md
@@ -1,6 +1,49 @@
 # About
 Comparison of learning and inference speed of different GPU with various CNN models in __pytorch__
+List of tested AMD and NVIDIA GPUs:
 
+### Example Results
+
+Following benchmark results has been generated with the command: ./show_benchmarks_resuls.sh
+Graph shows the 7700S results both with the pytorch 2.3.1 and with pytorch 2.4.0.
+ROCM SDK builders pytorch 2.4.0 contains the optimized flashattention support for
+AMD RX 7700S. (and other gfx1100/gfx1101/gfx1102 and gfx1103 cards)
+
+![Resnet Benchmark for Half-type](fig/comparison/resnet_benchmarks.png  "Pytorch with AMD GPU")
+
+# Benchmark Execution
+
+### Benchmarking All GPUs
+
+This command will use pytorch to search all GPUs and will then
+run the benchmark for each of them separately and then in the end
+the benchmark that uses all of the GPUs
+
+    ./run_benchmarks.sh
+
+### Benchmarking One GPU
+
+This command shows how to execute the benchmark for single gpu by using the -i parameter.
+
+    python3 benchmark_models.py -i 1 -g 1
+
+First GPU has index 0, second 1, etc...
+
+### Benchmark Results
+
+* New Results are stored under "new_results" folder
+* Existing old results are under results folder
+* After running the benchmarks, you can create a pull request to github
+  to request to get them merged
+* You can view the results of new benchmarks by addings the name of it's result
+  file to plot_benchmarks.py and then running the show_benchmarks.sh script.
+
+## List of Benchmarked GPUs
+
+* AMD_Radeon_RX_6800
+* AMD_Radeon_RX_7900_XTX
+* AMD_Radeon_RX_7700S (Framework 16 laptop discrete GPU)
+* AMD_Radeon_780M     (Framework 16 laptop iGPU)
 * 1080TI
 * TITAN XP
 * TITAN V
@@ -42,11 +85,11 @@ ResNet152, DenseNet121, DenseNet169, DenseNet201, DenseNet161 mobilenet mnasnet
 
 ## Usage
 
-`./test.sh`
+`././run_benchmarks.sh`
 
 ## Results
 
-###  requirement
+###  Requirement
 * python>=3.6(for f-formatting)
 * torchvision
 * torch>=1.0.0
@@ -55,18 +98,24 @@ ResNet152, DenseNet121, DenseNet169, DenseNet201, DenseNet161 mobilenet mnasnet
 * plotly(for plot)
 * cufflinks(for plot)
 
-
 ### Environment
 
-* Pytorch version `1.4`
+* Pytorch version `2.3`
 * Number of GPUs on current device `4`
 * CUDA version = `10.0`
 * CUDNN version= `7601`
 * `nvcr.io/nvidia/pytorch:20.10-py3` (docker container in A100 and 3090)
 
-
-
 ### Change Log
+* 2024/07/22
+  * benchmarks can now be run also on AMD gpus 
+  * ./run_benchmarks.sh script uses now pytorch to query the gpu count
+    and will first run the tests for each device separately and then
+    by using all GPU's simultaneously
+  * new benchmark results are saved to new_results/<gpu_index>/<gpu_name> folder
+  * added new "-i" option which can be used to specify which GPU to use
+  * If gpu index is not specified with -i option but the total gpu count specified
+    by -g option > 1, then the tests will be run in a way that it uses all gpus simultaneously
 * 2021/02/27
   * Addition result in RTX3090
   * Addition result in RTX2060(thanks for gutama)