|
|
|
|
|
by BeeOnRope
2755 days ago
|
|
Actually unrolling is often very important. In some cases it is even more important with modern high speed out-of-order cores. For example, you might need several accumulators to handle instruction or memory latency. For small loops, unrolling is the most important of all, since loop carried dependency chains are dense, and the loop overhead is a high fraction of the overall work. It is easy to get a 2x speedup by unrolling a small loop, and even larger speedups are not uncommon. So this "unrolling rarely helps" idea is just as much of a myth as "unrolling never helps". The main problem with unrolling is that the compilers usually don't do it intelligently - usually loops are unrolling if some kind of threshold is met, depending on compiler options - but this always happens in kind of a feed-forward way, rather thank a feed-back way, which would involve unrolling the loop and analyzing the benefit and costs after further optimization passes. |
|
At least, I haven't seen todays (2018) compilers cut dependency chains on without a #pragma omp reduce, or other assistance from the programmer.
I've unrolled loops myself to good sucess. But it isn't as easy as some people think it is.
Without knowing about dependency chains or ILP, small loops are often best left in smaller compact form. You leverage the branch predictor and minimize uop cache usage. I'd argue the typical program benefits from compact loops more.