|
|
|
|
|
by gpderetta
1477 days ago
|
|
The slower algo will use more instructions as will have to run for more iterations. The faster algo will use wider ALUs that are more power hungry, so it appears a wash. But because less instructions are in flight, less power need to be spent to keep track of them or renaming registers, caches need to powered for a smaller amount of time and so on. |
|
The whole point of this thread is realization of opposite. Slower algo executes only 2 instructions in a loop, but second one directly depends on the result of the first (induces pipeline stall) while the fast version brute forces a ton of instructions at full CPU IPC.