Hacker News new | ask | show | jobs
by Bromeo 901 days ago
Probably they could get a nice speed up by changing the if condition from your link into a multiplication with (eoe != 0) two lines down instead.
1 comments

Good point!

For me the beginner's mistake is: on a CPU any 'if' will cost you a clock cycle (branch prediction not considered), on a GPU it costs you the sync. In best case the syncs will come back over many loops, more often some cores will go out of sync more and more, eventually all other cores will have to wait for the last. Otherwise memory accesses will come blocking and that would cost all cores thousands of cycles.