Hacker News new | ask | show | jobs
by openasocket 465 days ago
There's prior art in this, very very old prior art. If you check out the first volume of The Art of Computer Programming, it uses a fictitious architecture called MIX. I understand that while it was fictitious, it was made to be similar enough to contemporary architectures. In it there are no special stack manipulation instructions, because there is no first-class notion of a stack! Functions used global variables for their scratch space. To call a function you would just jump to that address. Of course that function needed to know where to jump back to when complete. To do that, before jumping, the caller would WRITE THE RETURN ADDRESS TO THE JUMP INSTRUCTION AT THE END OF THE FUNCTION. This seems kind of insane in the modern day (function calls requiring self-modifying code!) but it meant you could implement functions without needing even a single extra word of storage space, and all you really gave up was recursion. I believe the original Fortran and some other older languages were originally implemented this way.

There's definitely an advantage with this idea, but also some downsides. While you have a guaranteed upper bound on memory usage, it's completely static and actually going to be worse than doing the equivalent on the stack. Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling). Another advantage to the stack-based approach is due to caching. In the static approach, whether or not a function's scratch space is in the cache or not depends on how often that function is called. But in the stack-based approach, the top of the stack is almost always in the L1 cache, giving a speed boost to functions, even when they are not called frequently.

That said, I do think it is an interesting idea that is worth exploring.

2 comments

> Suppose you have 100 functions that each need 100 bytes of scratch space. Statically allocating everything like you describe means you need 10KB of memory reserved. But suppose there is one main function, and the rest are helper functions, and the main function calls each helper function one at time and the helper functions don't call anything. With the static approach, you still need 10KB. With the stack-based approach, you only need 200 bytes (the stack space for the main function, and the stack space for the helper function it is currently calling).

This is not necessarily the case: if you statically analyze the call graph, you may be able to prove that two functions will never be live at the same time, and thus reuse memory regions for their local variables: https://github.com/llvm-mos/llvm-mos/blob/main/llvm/lib/Targ...

Of course, recursion, function pointers, and dynamic linking will defeat (or at least complicate) this scheme.

That's a good point, you can definitely optimize. I'd be curious how far you could go using static analysis. Assuming no recursion, function pointers, dynamic linking, etc; can you optimize to the point that you use as little memory as a stack approach would? I think you ALMOST can! The only blocker would be specific call patterns that would require a lot of memory, but are unreachable. That's where the halting problem hits.

You just make a call graph where the nodes are functions and directed edges indicate which function calls which. Without recursion function pointers and the like you can compute this exactly and this forms a DAG. Each node is given a weight, which is the amount of local memory that function needs. Start at the entry point function, and compute the longest distance path (using the node weight).

Original Fortran did not modified the code. Rather each function had a global variable storing the address to return that the caller had to set.
The CDC 6600 of blessed memory had "return jump" instruction that wrote an unconditional jump back to the return address into the word prior to the first instruction of the called procedure. So one would implement a subroutine return by jumping to the word before the entry point.

(Fortran programmers are probably wondering how alternate ENTRY statements worked; yes, they had to allocate a word before the ENTRY, and copy whatever jump instruction was placed there by the hardware into the main subroutine's jump word, so that RETURN or END would work for all entry points. Recursive languages like Pascal had to copy that word to and from the stack. Reentrant code had to avoid using the return jump instruction entirely.)