Hacker News new | ask | show | jobs
by g0b 1493 days ago
I'm in fact talking about post-Volta hardware there, but this is not about forward progress, I meant using __ballotsync() and getting it wrong (ie waiting on the __activemask() from outside an if, but only in one branch of the if, meaning some of the threads will never participate in the sync) will deadlock the GPU.

It's a powerful (since _different_ locations statically can sync with each other), but also risky abstraction to expose, as compared to GLSL where it's impossible to deadlock anything by using subgroup intrinsics.

1 comments

That's indeed a quite raw abstraction, but is way too powerful performance-wise to not expose...
Perhaps it makes sense for CUDA to expose it, but it certainly can't make sense for SPIR-V which has to work for a variety of hardware, most of which doesn't do ITS