|
|
|
|
|
by gerbenst
1602 days ago
|
|
The key point of this article (I'm the author) is that even if cmov would be like 50 cycle latency instruction and the penalty of a mispredicted branch is 0 (ie. after the cpu sees a branch is mispredicted it would immediately start executing the instructions from the correct branch), than cmov would still be faster on modern CPU's. Because in branchless quicksort, no matter how long it takes to figure out where the element has to go relative to the pivot it doesn't obstruct the CPU's ability in this algorithm to start working on the next element the next cycle. |
|