|
|
|
|
|
by jeeceebees
1681 days ago
|
|
How does the performance between GPU programs written with std::par compare to those written in CUDA? Do you happen to know of any online resources that show a comparison of the kernel code and performance of the two frameworks on common tasks? |
|
In Table 3, first and last columns shows the performance of CUDA and std::par in % of theoretical peak.
The rows show results for different GPU architectures.
On V100, CUDA achieves 62% theoretical peak and std::par 58%.
The amount of developer effort required to achieve over 50% theoretical peak with std::par makes it a no brainer IMO.
If there is one kernel where you need more performance, you can always implement that kernel in CUDA, but for 99% of the kernels in your program your time might be better spent elsewhere.