|
|
|
|
|
by volta83
1681 days ago
|
|
This paper ported a CFD application, which had a tuned CUDA implementation, to std::par: https://arxiv.org/pdf/2010.11751.pdf . In Table 3, first and last columns shows the performance of CUDA and std::par in % of theoretical peak. The rows show results for different GPU architectures. On V100, CUDA achieves 62% theoretical peak and std::par 58%. The amount of developer effort required to achieve over 50% theoretical peak with std::par makes it a no brainer IMO. If there is one kernel where you need more performance, you can always implement that kernel in CUDA, but for 99% of the kernels in your program your time might be better spent elsewhere. |
|