Hacker News new | ask | show | jobs
by goldenkey 2816 days ago
Cuda has Cooperative Groups now on Volta and Turing architectures. This allows for synchronization between entire workgroups rather than just locally. So you can pretty much keep your entire job on the GPU even if it involves multiple kernels. Really important for complex jobs where performance is a must.

https://devblogs.nvidia.com/cooperative-groups