Hacker News new | ask | show | jobs
by avianes 1434 days ago
> There are a few cross-lane shuffles / reduce instruction but it seems to me that those would be handled in a dedicated execution unit. (they are not really the fast-path/common case)

Yes, you essentially need a (kind of) crossbar for shuffle and value broadcast. But as far as I know there is no unit dedicated to this on Nvidia GPU. However, depending on the GPU microarchitecture, shuffle and broadcast may be implemented differently (e.g. through the load/store units).

Note that I said "crossbar" for simplicity and because there is little information available, I doubt that all the paths really exist