| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 558 days ago

GPU style SIMD parallelization cannot take separate if/else branches.

If/else in GPU land is implemented by having the GPU execute the if() side with (EXEC-mask), THEN the else() side with (not EXEC-mask).

Ie: the exec mask makes the appearance of skipping over the unnecessary code. But in practice, one of the 32 CUDA threads executes on one or the other branch. And this the system must physically execute both while throwing away the results.

---------

CPU parallelization in contrast is a true skip of the unnecessary if/else side. It takes a branch predictor to do it well though.

This also means that in a GPU, if one (of the 32 CUDA threads aka lanes) needs to loop 10,000 times, then ALL the CUDA lanes loop 10,000 times (with the other lanes possibly throwing away 9,999+ iterations of work as waste heat).