Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task03 Ilya Bolkisev ITMO #340

Closed
wants to merge 2 commits into from

Conversation

IlyaBolkisev
Copy link

@IlyaBolkisev IlyaBolkisev commented Jan 14, 2025

Локальный вывод

Фрактал Мандельброта

/home/realist/CLionProjects/GPGPUTasks2024/cmake-build-debug/mandelbrot 1
OpenCL devices:
  Device #0: CPU. Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz. Intel(R) Corporation. Total memory: 32028 Mb
  Device #1: GPU. NVIDIA GeForce RTX 2070 SUPER. Total memory: 7973 Mb
Using device #1: GPU. NVIDIA GeForce RTX 2070 SUPER. Total memory: 7973 Mb
CPU: 0.403556+-0.000521511 s
CPU: 24.7797 GFlops
    Real iterations fraction: 56.2638%
GPU: 0.0015735+-5e-07 s
GPU: 6355.26 GFlops
    Real iterations fraction: 56.2656%
GPU vs CPU average results difference: 0.943475%

Суммирование чисел

/home/realist/CLionProjects/GPGPUTasks2024/cmake-build-debug/sum 1
CPU:     0.191234+-9.39835e-05 s
CPU:     522.919 millions/s
CPU OMP: 0.0287707+-0.00298867 s
CPU OMP: 3475.76 millions/s
OpenCL devices:
  Device #0: CPU. Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz. Intel(R) Corporation. Total memory: 32028 Mb
  Device #1: GPU. NVIDIA GeForce RTX 2070 SUPER. Total memory: 7973 Mb
Using device #1: GPU. NVIDIA GeForce RTX 2070 SUPER. Total memory: 7973 Mb
[sum_atomic]
    GPU: 0.00231293+-2.73171e-06 s
    GPU: 43235.1 millions/s
[sum_loop]
    GPU: 0.0021388+-0.000182515 s
    GPU: 46755.2 millions/s
[sum_loop_coalesced]
    GPU: 0.00197785+-0.000119951 s
    GPU: 50560 millions/s
[sum_local]
    GPU: 0.00312395+-4.30912e-05 s
    GPU: 32010.8 millions/s
[sum_tree]
    GPU: 0.0033166+-0.000217515 s
    GPU: 30151.4 millions/s

3.2.6. Ожидаемо GPU кратно превосходит CPU. Самым эффективным методом оказался sum_loop_coalesced (Коалесность ощутимо сократило время выполнения суммы, что было вполне ожидаемо. Однако методы использующие локалюную память и параллельную репродукцию, показали результаты хуже ожидаемых, возможно где-то были подобраны неоптимальные параметры или недооптимизирован код.
Вывод Github CI

Фрактал Мандельброта

Run ./mandelbrot
OpenCL devices:
  Device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15991 Mb
Using device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15991 Mb
CPU: 0.601875+-0.00135832 s
CPU: 16.6148 GFlops
    Real iterations fraction: 56.2638%
GPU: 0.165415+-0.00524466 s
GPU: 60.454 GFlops
    Real iterations fraction: 56.263%
GPU vs CPU average results difference: 1.60179%

Суммирование чисел

Run ./sum
CPU:     0.0320988+-5.97785e-05 s
CPU:     3115.38 millions/s
CPU OMP: 0.0179577+-0.000722665 s
CPU OMP: 5568.65 millions/s
OpenCL devices:
  Device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15991 Mb
Using device #0: CPU. AMD EPYC 7763 64-Core Processor                . Intel(R) Corporation. Total memory: 15991 Mb
[sum_atomic]
    GPU: 1.47873+-0.0119956 s
    GPU: 67.6254 millions/s
[sum_loop]
    GPU: 1.55627+-0.0109116 s
    GPU: 64.2562 millions/s
[sum_loop_coalesced]
    GPU: 1.5572+-0.0119588 s
    GPU: 64.2179 millions/s
[sum_local]
    GPU: 0.129938+-0.000299714 s
    GPU: 769.599 millions/s
[sum_tree]
    GPU: 0.132342+-0.000164108 s
    GPU: 755.62 millions/s

@simiyutin simiyutin closed this Jan 14, 2025
@simiyutin
Copy link
Collaborator

Задача зачтена

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants