Hacker News new | ask | show | jobs
by kcb 427 days ago
CUDA offers grid wide cooperative groups which can synchronize pretty efficiently. And there's also graphs if you know the kernels you're launching ahead of time.