|
|
|
|
|
by phire
1675 days ago
|
|
The stack engine only handles the adjustment of the stack pointer, converting the push and pop to regular load/store uops. But the store-then-load pattern is optimised by the store buffers, which do store-forwarding to forward the result of the in-flight store to the load without having to go though L1 cache. It's not quite free, you still have to complete the store (the cpu can't assume optimising away a stack push is safe, unless it's actually overwritten) and there is still a 4 cycle latency, but that probably isn't an issue due to out-of-order execution. |
|