Hacker News new | ask | show | jobs
by jeeceebees 1681 days ago
How does the performance between GPU programs written with std::par compare to those written in CUDA?

Do you happen to know of any online resources that show a comparison of the kernel code and performance of the two frameworks on common tasks?

1 comments

This paper ported a CFD application, which had a tuned CUDA implementation, to std::par: https://arxiv.org/pdf/2010.11751.pdf .

In Table 3, first and last columns shows the performance of CUDA and std::par in % of theoretical peak.

The rows show results for different GPU architectures.

On V100, CUDA achieves 62% theoretical peak and std::par 58%.

The amount of developer effort required to achieve over 50% theoretical peak with std::par makes it a no brainer IMO.

If there is one kernel where you need more performance, you can always implement that kernel in CUDA, but for 99% of the kernels in your program your time might be better spent elsewhere.