Hacker News new | ask | show | jobs
by raphlinus 1104 days ago
That's correct, it's the memory scope that I expect to be device-scoped. GPUs tend not to have execution barriers in the shader language beyond workgroup scope; generally the next coarser granularity for synchronization is a separate dispatch. However, single-pass prefix sum algorithms, including decoupled look-back, can function just fine with device-scoped memory barriers, and do not require execution barriers with coarser scope than workgroup.