|
|
|
|
|
by raphlinus
1104 days ago
|
|
That's correct, it's the memory scope that I expect to be device-scoped. GPUs tend not to have execution barriers in the shader language beyond workgroup scope; generally the next coarser granularity for synchronization is a separate dispatch. However, single-pass prefix sum algorithms, including decoupled look-back, can function just fine with device-scoped memory barriers, and do not require execution barriers with coarser scope than workgroup. |
|