| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by phire 1675 days ago

The stack engine only handles the adjustment of the stack pointer, converting the push and pop to regular load/store uops.

But the store-then-load pattern is optimised by the store buffers, which do store-forwarding to forward the result of the in-flight store to the load without having to go though L1 cache.

It's not quite free, you still have to complete the store (the cpu can't assume optimising away a stack push is safe, unless it's actually overwritten) and there is still a 4 cycle latency, but that probably isn't an issue due to out-of-order execution.

1 comments

brigade 1675 days ago

It gets more "free" once you have the zero-latency loads introduced in Zen 2 and the load can be speculatively replaced with a register move if the store is close and obvious enough

link

dataangel 1675 days ago

How can you have a zero latency load?

link

brigade 1674 days ago

Similar way register movs can have zero latency - the output is renamed from the register source of the corresponding store. Which takes the load out of the dependency chain, effectively having zero latency so long as the correct store was identified.

link