You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current workflow of writing/optimizing CUDA kernels is very difficult because there is no proper, consistent way of measuring the performance of kernels.
A simple and consistent tool to measure and profile CUDA kernels is required.
Requirements
Automatic measuring of FLOPS (probably using nvprof)
Measuring of parallel scaling
Simple, nutshell API
Plotting the benchmark reports (probably using pyplot, gnuplot)
The text was updated successfully, but these errors were encountered:
Make a efficient CUDA micro benchmark framework
The current workflow of writing/optimizing CUDA kernels is very difficult because there is no proper, consistent way of measuring the performance of kernels.
A simple and consistent tool to measure and profile CUDA kernels is required.
Requirements
The text was updated successfully, but these errors were encountered: