|
|
|
|
|
by halomru
3377 days ago
|
|
You can't really do loop unrolling and constant folding as some suggest because the number of iterations is determined at run time. But assuming his CPU takes one cycle for an addition and one cycle for a conditional jump, his CPU only needs to run at 1GHz to achieve his result. Given that branch prediction should be nearly perfect for such a simple and short loop, modern x86 CPUs doing at least 4 integer additions per cycle, and typical CPU speeds in the range of 2-4 GHz, a properly optimized version should be nearly an order of magnitude faster. So either more aggressive compiler flags and maybe SIMD intrinsics, or hand written assembly (easy here, not so easy in the real world) |
|
Huh?