Hacker News new | ask | show | jobs
by halomru 3377 days ago
You can't really do loop unrolling and constant folding as some suggest because the number of iterations is determined at run time.

But assuming his CPU takes one cycle for an addition and one cycle for a conditional jump, his CPU only needs to run at 1GHz to achieve his result.

Given that branch prediction should be nearly perfect for such a simple and short loop, modern x86 CPUs doing at least 4 integer additions per cycle, and typical CPU speeds in the range of 2-4 GHz, a properly optimized version should be nearly an order of magnitude faster.

So either more aggressive compiler flags and maybe SIMD intrinsics, or hand written assembly (easy here, not so easy in the real world)

1 comments

> You can't really do loop unrolling and constant folding as some suggest because the number of iterations is determined at run time.

Huh?

That question is a bit unspecific :D

>NUMBER = atoi(argv[1]);

argv is populated at runtime based on what the OS passed you.

But you are right if you meant that my statement is too strong: you can't do arbitrary loop unrolling. You can jump to a loop that's unrolled once if the number is even, etc.

There are certainly also optimizers that will just output the input without any loop, but that's beyond anything I would expect of a current production compiler.

Well, you could do arbitrary loop unrolling - with Duff's Device:

https://en.wikipedia.org/wiki/Duffs_device

If you are going from 0 -> NUMBER and adding one to a variable every iteration, what will the variable's final result be? NUMBER. The compiler can see this and will just do the following:

    s = i = atoi(argv[1]);
Now the loop is gone.

Edit: This will still happen even if you're doing other stuff in there. GCC will attempt to inline and optimize like this. If it is impossible then you will see a loop with most of the stuff factored out of it.