Hacker News new | ask | show | jobs
by tomsmeding 2753 days ago
In addition to the optimisations already mentioned, loop unrolling also typically enables vectorisation in compilers. You might argue that for vectorisation it is not exactly necessary to have the relevant oerations next to each other in a continuous instruction stream, but it makes the vectorisation pass a lot nicer and simpler (if it can be called that to begin with).
1 comments

Assuming you unroll just enough to fill a SIMD lane. As mentioned, in this case aggressive (16-fold) unfolding actually appears to have prevented vectorization. (A smart enough vectorizer could of course handle this but unrolling just to ”re-roll” in a later pass doesn’t sound very smart.)