Hacker News new | ask | show | jobs
by xapata 3433 days ago
The lack of tail-call optimization to make the CPython interpreter simpler and debugging easier by preserving the call stack. It was a choice, not an oversight.
2 comments

From a debugging viewpoint, this does not make sense, there is usually no interesting information in the in between frames.

TCE can also make debugging easier, how useful is a stack trace of 1000 lines consisting of

  ...
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
  File "bla.py", line 4, in fib
    return fib(n - 1) + fib(n - 2)
...

Not so much I think.

I like how you picked an example that is explicitly not TCO-able.

In any case, this particular problem is no longer an issue as of Python 3.6, as that now collapses repeated stacktrace lines (see https://bugs.python.org/issue26823). Although this doesn't work for mutual tail calls, it does solve the debug noise issue in the most common case.

Ah yes, that is a bit stupid, I just wanted an example of a traceback :)
The notion that not having proper tail calls aids debugging always seemed like a post-hoc justification. The stack trace of an iterative function will lack exactly the same intermediate evaluation frames as a tail-recursive implementation.
The thing is, tail calls aren't _just_ about emulating iteration via recursion:

  def foo():
      raise ValueError

  def bar():
      return foo()

  bar()
With TCO, the stack trace would contain `main` and `foo`, as `bar`'s frame would be overwritten by `foo`. This example is simple, but `bar` could be a 50 line long if-else chain of tail calls and when debugging you won't necessarily know which condition was evaluated.
> The thing is, tail calls aren't _just_ about emulating iteration via recursion:

I completely agree, but there is also no need to perform TCO to make code like this safely runnable. TCO only becomes necessary/useful when implementing an iterative process where we can't statically know that the call stack won't be exhausted. That said, TCO is usually an all or nothing transformation, and it would be difficult to accurately avoid eliminating trivial tail calls like in your example.

A reasonable compromise might be for the Python VM to implement a TAIL_CALL bytecode op and require the programmer to decorate functions which rely on TCO. This wouldn't be any more onerous than manually trampolining large portions of code, which is the current method of getting around the lack of TCO.

A decorator that enabled TCO makes sense to me. Kind-of like the Numba project, it'd be a specialized JIT-compiler invoked only on some functions.

What's stopping that from being a 3rd-party library like Numba?

You can find simple decorators which try to provide space efficient tail recursion. Usually they work by trampolining the function. I've seen one example where a decorator rewrites the bytecodes to perform a goto in the case of self recursion. The problem is that all of these solutions are rather limited, easy to break, or have a pretty high runtime overhead. The general solution would be for a decorator to rewrite all CALL opcodes in tail postion to TAIL_CALL opcodes, but such an opcode currently does not exist. The actual implementation of a TAIL_CALL opcode would be almost idential to the CALL opcode, so adding it would probably be straightforward, but I'm speculating here.
Why not just make it a dev/production flag, then?
Probably because many tail-recursive functions _rely_ on tail-call elimination working reliably. Without also having an unbounded call stack, disabling tail-call elimination will likely just cause your programs to crash.