|
|
|
|
|
by userbinator
2758 days ago
|
|
So it's an issue of the optimizer; as is often the case, it unrolls too aggressively and shoots itself in the foot, all the while missing out on various other opportunities. In my experience, loop unrolling should basically never be done except in extremely degenerate cases; I remember not long ago someone I know who also optimises Asm remarking "it should've died along with the RISC fad". The original goal was to reduce per-iteration overhead associated with checking for end-of-loop, but any superscalar/OoO/speculative processor can "execute past" those instructions anyway; all that unrolling will do is bloat the code and work against caching. Memory bandwidth is often the bottleneck, not the core. |
|
Not true. Like many such optimizations, loop unrolling can be useful because it makes downstream loads constant.
For example:
In this case, the compiler probably wants to unroll the loops so that it can straightforwardly forward the constant matrix entries directly to the matrix arithmetic. It'll likely be able to eliminate lots of operations that way.(You might ask "who would write this code?" As Schemers say: "macros do.")
See LLVM's heuristics: http://llvm.org/doxygen/LoopUnrollPass_8cpp.html#ad7c38776d7...