Hacker News new | ask | show | jobs
by fluffything 2171 days ago
> Tail call removal. A recursive function that ends in a call to itself can often be rewritten as a loop, reducing call overhead and reducing the chance of stack overflow.

Most important: this optimization enables pipelined execution.

When people talk about a CPU executing an integer add instruction in ~1 cycle, what they actually mean is that the add has this latency when the CPU pipelines are full.

If you have an 11 stage pipeline... the add can often have a latency of ~11 cycles... if you write the _right_ code for it.

3 comments

FWIW, at least on intel cpu call instructions are fully pipelined and effectively zero latency (although, like all jump instructions there is a limit of one call every other cycle).
Interestingly, I've seen infinite recursions get optimized to infinite tail calls. This makes debugging harder because instead of a stack overflow you just have an infinite loop and have to manually go kill the process and get a breakpoint.

Then, looking at the code it's not obvious where the infinite loop occurs.

Tail call removal is much broader than that. Compilers will remove even indirect tail calls, which is useful when building a fast interpreter. I tried this with a small example (using Godbolt, of course), now let’s see if it works for a full interpreter.