Hacker News new | ask | show | jobs
by mnem 3262 days ago
It has to get put on the stack at some point so you can call more than 1 function deep. So why not always put it on the stack so that you don't waste a valuable register?
2 comments

The answer to "why not always put it on the stack" is "because a lot of functions are leaf functions and so always writing it to the stack is making every function pay the memory access hit rather than just the ones that need it". RISC-ish architectures tend to have enough registers that dedicating one to a link pointer isn't a big deal (and once you do spill it to the stack you can use the link register as a temporary register anyway).

Some very early CPU architectures didn't actually support either putting the return address in a register or on the stack. For instance, on the PDP-8 (https://en.wikipedia.org/wiki/PDP-8#Subroutines) the JMS instruction writes the return address to the first word of the subroutine it's about to call (and the actual subroutine entry point is just after that), which meant it didn't conveniently support recursion. It wasn't alone in that either -- I think that it just wasn't quite appreciated how important recursion/reentrancy was back in the early 60s when these ISAs were designed.

Sure, and SPARC has register windows, but also still has control-flow integrity attacks; overflows are just as bad there.
Registers aren't all that valuable on architectures with reasonable numbers of them, and a lot of architectures do "branch and link" instead of an x86-style call. Branch and link generally means that control flow jumps elsewhere and the address of the next instruction is stored in a register. You jump back to that register to return. Functions are responsible for saving the link register if they clobber it.

This has at least one benefit over x86-style calls: a function like this:

  void foo(void)
  {
      for (int i = 0; i < 10; i++)
          some_leaf_function();
  }
has to save its own return address to the stack, but it only needs to save it once, so all ten leaf calls can happen without stack access for the return address.

Of course, architectures like x86 have specialized hardware to optimize calls, so it's probably a wash in the end.