Hacker News new | ask | show | jobs
by thelucky41 4574 days ago
I've run into something similiar on a different benchmark where inserting some 'nops' to the preamble for a function actually sped it up as much as 14% because it made the function align better with a memory boundary so the CPU could access it faster. Benchmarks, especially ones that don't control for the cases where 'luck'/alignment/register use/etc can influence the outcome, are terrible testcases to explore behaviour.
2 comments

Google's microoptimizer actually has "try inserting random NOOPs" as one of its optimization passes: http://code.google.com/p/mao/
He mentioned in the comments that he ensured everything is aligned to 64-bit boundaries
64 bit might not be the right alignment, and even aligned to 64 bit boundaries you can run into page and cacheline issues.
Don't you always have to pad back with nops (0x90) since the scheduler might be looking ahead for other instructions to execute (or maybe not, since it's seeing jmp back). Just wildly guessing here...
Just to be clear, I tried it with 64-byte alignment