|
|
|
|
|
by Remnant44
712 days ago
|
|
It's rare to need to work at this level of optimization, but this is a really neat trick! Modern cores are quite wide - capable of running 6-8 instructions at once, as long as there are no dependencies. Something as simple and common as a summation loop can often be sped up 2-4x by simply having multiple accumulators that you then combine after the loop body; this lets the processor "run ahead" without loop carried dependencies and execute multiple accumulations each cycle. This technique is similar in concept, but even more general. Put the "guess" in the registers and run with it, relying on a second instruction within a branch to correct the guess if its wrong. Assuming your guess is overwhelmingly accurate... this lets you unlock the width of modern cores in code that otherwise wouldn't present a lot of ILP! Clever. |
|
Shouldn't the compiler be smart enough to figure that out these days (at least if it truly is a straightforward accumulation loop)?