|
|
|
|
|
by userbinator
4089 days ago
|
|
It's a lot smaller (1 byte vs 6), which means less space spent in the cache and decoder, reducing cache misses and decode bandwidth. The x86 also has a dedicated "stack engine" since the Pentium M (but not suprisingly, absent in NetBurst), which contains an adder and copy of the stack pointer to handle push/pop operations. This is faster than using the general-purpose ALUs and memory read/write ports, and also frees those up for use by other non-stack instructions. On the other hand, it means reading/writing the stack pointer explicitly between implicit stack operations incurs a little extra latency to get the values between the stack engine and "real" ESP register synchronised. Memory reads/writes do take a few more cycles to complete, but since this is a write, the CPU can continue on with other non-dependent instructions following it. All the above information assumes a CPU based on P6 and its successors (Core, Nehalem, Sandy Bridge, Ivy Bridge, Haswell, etc.); NetBurst and Atom are very different. Linus also has some interesting things to say about using the dedicated stack instructions: http://yarchive.net/comp/linux/pop_instruction_speed.html Somewhat amusingly, GCC was well known to generate the explicit sub/mov instructions by default, while most other x86 C compilers I knew of, including MSVC and ICC, would always use push. |
|