You would then add any function you would tail call optimize into that function. Granted, with the right structure (inside > outside >... > inside) it can show up more than once on the stack but the same thing can happen despite tail call optimization.
Mini-interpreter implementations do not count, for performance reasons. You could have called a VM bytecode interpreter loop a "loop alternative to tail call" here.
My first example of (f x) is about as fast as you can get.
The longer example is not great the point is it's a very language agnostic tradeoff of speed vs memory. Replace the ‘if’ conditionals with a case statement and its much faster you can speed it up further by using continue.
You can speed it up even more by having a jump at the end of each statement, but that stops looking like a loop and is basically just the tail call optimization.
Anyway, the important part is its low memory utilization and compiler independent.
PS: I in no way suggest tail call optimization was not useful; just you can mechanically simulate it when not available. You can usually beat that mechanical approach if you need to speed things up further.
> just you can mechanically simulate it when not available.
It's too slow to ever be anywhere near practical - for this reason, almost no JVM language implementation really does anything like this, besides kawa, bigloo and alike, which are slower than some of the dumb interpreters like SISC.
PS: Feel free to look for any counter example.