| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cavisne 868 days ago
	Causes every thread in the block to wait until they have reached this point. Worth reading a cuda primer for more details on blocks/warps. Since the threads are relying on each other to fill the SRAM with all needed data if you didn’t wait then values would be missing.

1 comments

Any CUDA primer you recommend in particular? I had this same question.

Here's an article on syncing in CUDA via cooperative groups: https://developer.nvidia.com/blog/cooperative-groups/

There's also explicit warp synchronization, i.e. __syncwarp(). More on warp primitives here: https://developer.nvidia.com/blog/using-cuda-warp-level-prim...

Probably https://www.youtube.com/watch?v=nOxKexn3iBo (or just skimming the attached colab).

This is terrific, thanks!