Hacker News new | ask | show | jobs
by lawrenceyan 1481 days ago
Given that in terms of “absolute work” done, the optimization does hold true, is there any situation where it would be beneficial to implement this?

(Super low energy processors, battery powered, etc?

1 comments

Certainly! It makes sense in processors that don't do SIMD or speculative execution. There are a lot of those, but mostly for embedded stuff.
> or speculative execution

This isn't actually taking advantage of speculative execution that much. The only speculation here would be in the predicting the loop repeats, which loop unrolling would mostly negate for CPUs that don't do speculative execution.

The data dependency issue, however, would still be a punishing factor. You'd need a CPU that isn't superscalar, which does exist but is increasingly less common (even 2014's Cortex-M7 was superscalar, although it kinda sounds like ARM backed off on that for later Cortex M's?)

Also many low-end / embedded CPUs that are in-order will still do branch prediction.

Those are also CPUs were multiplication is most likely to be significantly more expensive, or not implemented in hardware at all (though almost everything has a multiplier these days).