| A GPU followup to this article. While on CPU sequentially consistent semantics are efficient to implement, that seems to be much less true on GPU. Thus, Vulkan completely eliminates sequential consistency and provides only acquire/release semantics[1]. It is extremely difficult to reason about programs using these advanced memory semantics. For example, there is a discussion about whether a spinlock implemented in terms of acquire and release can be reordered in a way to introduce deadlock (see reddit discussion linked from [2]). I was curious enough about this I tried to model it in CDSChecker, but did not get definitive results (the deadlock checker in that tool is enabled for mutexes provided by API, but not for mutexes built out of primitives). I'll also note that using AcqRel semantics is not provided by the Rust version of compare_exchange_weak (perhaps a nit on TFA's assertion that Rust adopts the C++ memory model wholesale), so if acquire to lock the spinlock is not adequate, it's likely it would need to go to SeqCst. Thus, I find myself quite unsure whether this kind of spinlock would work on Vulkan or would be prone to deadlock. It's also possible it could be fixed by putting a release barrier before the lock loop. We have some serious experts on HN, so hopefully someone who knows the answer can enlighten us - mixed in of course with all the confidently wrong assertions that inevitably pop up in discussions about memory model semantics. [1]: https://www.khronos.org/blog/comparing-the-vulkan-spir-v-mem... [2]: https://rigtorp.se/spinlock/ |
[1]: https://plv.mpi-sws.org/scfix/full.pdf