|
|
|
|
|
by calibraxis
4089 days ago
|
|
Why does `push eax` perform "much faster" than the following? sub esp, 4
mov DWORD PTR SS:[esp], eax
A brief skim of Intel's "Software Developer’s Manual" (particularly ch. 6 on stacks), didn't seem to find an answer.While hitting the ALU just for `sub` might be an extra step, doesn't hitting RAM make that a drop in the bucket? (`sub` may account for less than 1%?) Or is there some caching going on, so RAM may be updated in the background? (I'm not an assembly programmer; very ignorant of what's happening.) |
|
Memory reads/writes do take a few more cycles to complete, but since this is a write, the CPU can continue on with other non-dependent instructions following it. All the above information assumes a CPU based on P6 and its successors (Core, Nehalem, Sandy Bridge, Ivy Bridge, Haswell, etc.); NetBurst and Atom are very different.
Linus also has some interesting things to say about using the dedicated stack instructions: http://yarchive.net/comp/linux/pop_instruction_speed.html
Somewhat amusingly, GCC was well known to generate the explicit sub/mov instructions by default, while most other x86 C compilers I knew of, including MSVC and ICC, would always use push.