|
|
|
|
|
by ice799
5808 days ago
|
|
building the code in the bug report as a 64bit binary and various system information: http://gist.github.com/483494 and testing harness, scripts to build it, and to run it: http://gist.github.com/483524 -- yes i was too lazy to make a makefile. you will still need to construct some command line fu to separate the results into separate files so you can load it into whatever maths program you want. |
|
Your microbenchmark appears to be alignment sensitive. With your assembly code on my machine (quad-core 2.66 Core2 Quad) running for 250 tests I get:
which are similar results to yours.But if you add .align 8 right before the definition of test2 in the assembly file (i.e. make it be 8-byte aligned, just like test1), I get the following numbers:
so the code that "doesn't use frame pointers" is actually slightly faster, as you might expect.Additionally, if I simply modify your testcase to use 16-byte alignment, rather than 8-byte alignment, I get the following numbers:
I think aligning both test functions by 8 bytes at least makes things fair, but you can see that minor changes in alignment can cause big changes.You can see the assembly sources I used: http://gist.github.com/483840
FWIW, the code that uses movs rather than pushes and pops ought to be faster since (generally speaking for larger prologues and epilogues) you can execute a series of movs in parallel, whereas your pushes and pops are serialized, since they're all updating a common resource (the stack pointer). Empirical testing on benchmarks like SPEC2k has borne this out, both on x86 and x86-64. (You ought to be able to see this effect with gcc, depending on what cpu you use for the -mtune switch.) As you noted, this strategy carries a size penalty, since movs are somewhat larger than pushes and pops.
I'll also note that on my machine, with gcc saying it's:
I get identical assembly for compiling the testcase from the PR with and without -fomit-frame-pointer (I should have noted the gcc version I was using, just as you did. My bad.) Furthermore, for: I also get identical assembly. On one of the servers at work, with: I get identical assembly. Finally, also at work, with: which is a somewhat patched version of GCC circa 4.3.2, I get identical assembly. So with four different flavors of GCC, there's no difference on the testcase in the PR with and without -fomit-frame-pointer. I'd be willing to bet that there's no differences with 4.5.x and mainline GCC as well. It looks like Debian may just have a peculiar set of patches to its version of GCC.EDIT: formatting fixes.