Hacker News new | ask | show | jobs
by kid0m4n 4636 days ago
Fair enough... let me do a quick test with -O2 and see how that fares
1 comments

Might want to also give a go at -Os (optimize for small code size). On code that spends its time iterating on the same code over and over again this can be a big win.

[edit] Nope, definitely not better. I get O2 being a slight win over O3 and Os being much worse.

Compiler optimization flags are very code and type specific.

(Note that I am comparing apples to oranges here, I used the C++ code used in Rust experiments found here: https://github.com/huonw/card-trace/blob/master/original.cpp )

I changed the C++ version typedef float f to typedef double f, so using floats instead of doubles, compiling with the following flags:

    -m64 -march=corei7-avx -mtune=corei7-avx -Ofast -funroll-all-loops
and the run time dropped down from 17.5 seconds to 11.2 seconds. If I remove -funroll-all-loops, the run time jumps to 14.2 seconds. The original 17.5 seconds were ran with vanilla code using float and -O3. Interestingly enough, if you use the aforementioned flags with floats instead of doubles, the program executes in 15.01 seconds instead. Using floats is bad for performance! Further, if you remove -funroll-all-loops when using floats, the performance increases, but with doubles it decreases.

So, when optimizing, play with compiler flags. Play with types. Play with whatever you have at your disposal and make no assumptions. This stuff is far more complex than believing that certain flags are better than others, it all depends on everything.

So it totally disables all loop unrolling, inlining... hmm
Does a couple of other things, including choosing instruction sequences that are more compact afaik. But also favouring compactness over alignment and obviously jumps over unrolling. Obviously this isn't code that benefits terribly much from it, but it has been known to happen.