Hacker News new | ask | show | jobs
by monocasa 2823 days ago
Stack accesses _are_ different in hardware these days, which is why AArch64 brings the stack pointer into the ISA level vs AArch32, and why on modern x86 using RSP like a normal register devolves into slow microcoded instructions. There's a huge complex stack engine backing them that does in fact give you better access times averaged vs regular fetches to cache as long as you use it like a stack, with stack-like data access patterns. The current stack frame can almost be thought of as L½.
1 comments

The stack pointer is just that, a pointer. It points to a region of the heap. It can point anywhere. It's a data structure the assembly knows how to navigate, but it's not some special thing. You can point it anywhere, and change that whenever you want. Just like you can with any other heap-allocated data structure.

It occupies the same L1/L2 cache as any other memory. There's no decreased access times or fetches other than the fact that it just happens to be more consistently in L1 due to access patterns. And this is a very critical aspect of the system, as it also means it page faults like regular memory, allowing the OS to do all sorts of things (grow on demand, various stack protections, etc...)

Google "stack engine". Huge portions of the chip are dedicated to this; if it makes you feel better you can think of it as fully associative store buffers optimized for stack like access. And all of this is completely separate from regular LSUs.

There's a reason why SP was promoted to a first class citizen in AArch64 when they were otherwise removing features like conditional execution.

That's also the reason why using RSP as a GPR on x86 gives you terrible perf compared to the other registers, it flips back and forth between the stack engine and the rest of the core and has to manually synchronize in ucode.

EDIT: Also, the stack is different to the OS generally too. On Linux you throw in the flags MAP_GROWSDOWN | MAP_STACK when building a new stack.