|
|
|
|
|
by nkurz
4492 days ago
|
|
Thanks for putting this together! My immediate question was whether another compiler might be more competitive with the assembly. On my Sandy Bridge processor, I found that although GCC was the slowest, the assembly was still the clear winner: gcc 4.8.0: 0.801596113 seconds
icc 14.0.1: 0.739297534 seconds
clang 3.2: 0.706446818 seconds
assembly: 0.104038212 seconds
I was surprised by Clang here. It's an older version than the other two, yet fastest. A quick glance at 'perf stat' (which I used for the timing) says that although it's executing fewer instructions per cycle, it's managing to use fewer instructions than the other two.Although it doesn't seem to make much of a difference here, those are odd flags for Sandy Bridge. Core2 is a previous generation, and it supports AVX which came out after SSE4.2.
If you did to specify SB, you'd want the unwieldy "-march=corei7-avx". But probably better just to use '-mavx' or 'march=native', along with -Ofast or -O3. |
|