I had a 2x speedup a few years ago on math-heavy code (FFTs) by switching from gcc 4.2 to gcc 3.4 -- apparently there was a known issue in gcc's optimizer where it would flag too many values as "keep this in a register" and end up spending lots of time moving data to and from the "register overflow space" on the stack. (I don't know if this has been fixed -- as I said, it was a few years ago that I ran into this issue.)
I had a 2x speedup a few years ago on math-heavy code (FFTs) by switching from gcc 4.2 to gcc 3.4 -- apparently there was a known issue in gcc's optimizer where it would flag too many values as "keep this in a register" and end up spending lots of time moving data to and from the "register overflow space" on the stack. (I don't know if this has been fixed -- as I said, it was a few years ago that I ran into this issue.)