Monday, April 27, 2009

Final Words on Tail Calls

A lot of people remarked that in my post on Tail Recursion Elimination I confused tail self-recursion with other tail calls, which proper Tail Call Optimization (TCO) also eliminates. I now feel more educated: tail calls are not just about loops. I started my blog post when someone pointed out several recent posts by Pythonistas playing around with implementing tail self-recursion through decorators or bytecode hacks. In the eyes of the TCO proponents those were all amateurs, and perhaps that's so.

The one issue on which TCO advocates seem to agree with me is that TCO is a feature, not an optimization. (Even though in some compiled languages it really is provided by a compiler optimization.) We can argue over whether it is a desirable feature. Personally, I think it is a fine feature for some languages, but I don't think it fits Python: The elimination of stack traces for some calls but not others would certainly confuse many users, who have not been raised with tail call religion but might have learned about call semantics by tracing through a few calls in a debugger.

The main issue here is that I expect that in many cases tail calls are not of a recursive nature (neither direct nor indirect), so the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder. For example, if your have a function ending in something like this:
if x > y:
return some_call(z)
else:
return 42
and you end up in the debugger inside some_call() whereas you expected to have taken the other branch, with TCO as a feature your debugger can't tell you the value of x and y, because the stack frame has been eliminated.

(I'm sure at this point someone will bring up that the debugger should be smarter. Sure. I'm expecting your patch for CPython any minute now.)

The most interesting use case brought up for TCO is the implementation of algorithms involving state machines. The proponents of TCO claim that the only alternative to TCO is a loop with lots of state, which they consider ugly. Now, apart from the observation that since TCO essentially is a GOTO, you write spaghetti code using TCO just as easily, Ian Bicking gave a solution that is as simple as it is elegant. (I saw it in a comment to someone's blog that I can't find back right now; I'll add a link if someone adds it in a comment here.) Instead of this tail call:
return foo(args)
you write this:
return foo, (args,)
which doesn't call foo() but just returns it and an argument tuple, and embed everything in a "driver" loop like this:
func, args = ...initial func/args pair...
while True:
func, args = func(*args)
If you need an exit condition you can use an exception, or you could invent some other protocol to signal the end of the loop (like returning None).

And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as "the Basic of the future". Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages -- as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say "tail call optimization" or "proper tail recursion." :-)

24 comments:

Muad`Dib said...

refreshing that you stuck by your intuitions rather than submit to these TCO-requesting functional fiends. It's hard to understand why the general public wants to make every programming language converge toward the same point!

Sutadoeda said...

I'd like to point out that you can (almost) have your cake and eat it too on debugging, by blowing everything on the stack except for the function name and line.

Recursive calls can be done even better, by keeping a name, line, and counter of the number of times visited. This is actually much nicer to debug than 1000 lines of 'File "Foo.py", line 42, in foo'.

I personally think you should open it up to whomever can make a patch that does it and still debugs.

ilyag said...

I remember trying to solve the precise problem you mentioned with state machines, an failing. Thank you for the very beautiful solution! It seems to work whenever one would need tail recursion.

I hope that this will be added to the python documentation sometime -- it is not obvious at all (at least to me), concise, and solves a problem quite a few people are having.

Eli Barzilay said...

Re Bicking's solution -- this is a very known method called tampoline; see here for example (second paragraph in the introduction).

(And no, this doesn't get you real TCO, unless you're willing to run all function calls through your trampoline, including library functions and built-ins.)

nathany said...

Been reading said book over the past few days. Definitely some reoccurring themes, like providing simple constructs to build on, and of course lots of parallelization talk. An unfortunate amount of other-language bashing too.

apgwoz said...

> (I'm sure at this point someone will bring up that the debugger should be smarter. Sure. I'm expecting your patch for CPython any minute now.)

So, if the debugger is smarter, you'll add support for TCO?

Flying Frog Consultancy Ltd. said...

You could have both stack traces and TCO by pushing debug frames during tail calls, unwinding to the last non-tail debug frame at returns and compressing the stack it if it gets too big. That will not work in pathological cases (e.g. two functions repeatedly tail calling either themselves or the other at random, because the stack will not compress) but it could get you some mileage without breaking backwards compatibility.

Trampolines are not a solution because they prevent extensibility (they are essentially a local change to the calling convention). Hence they are only seen in other languages that lack tail calls, such as Scala and Clojure.

I agree that tail calls do not belong in Python though. With inadequate support for most other language features and paradigms, I do not imagine they would be of any use. Anyone who wants to reap the benefits of modern techniques can just use any one of the modern language implementations that provide them (and a lot more).

grant rettke said...

Can't you compromise in regards to TCO by blowing away the entire stack but for the arguments, and only keeping track the last 1000 calls?

grant rettke said...

RE: "Can't you compromise in regards to TCO by blowing away the entire stack but for the arguments, and only keeping track the last 1000 calls?"

That is, if you want TCO, and you also want a stacktrace in a given language?

Matthew said...

There seem to be 2 sets here:

Programmers who program like Software Developers, and Programmers who program like computer scientists.

As a CS student, recursion was a critically important technique, and things like TCO where necessary language features.

As a Software Developer, I have used or seen recursion precisely 0 times in shipping code. And powerful debuggers and accurate stack traces are necessary language features.

Michael Merickel said...

All of Guido's complaints about TCO seem to involve debugging the code. Why can't TCO just be enabled in an optimized compile? Doesn't Python have that mythical -O and -OO that basically do nothing, currently?

shevegen said...

I am not a python dev nor a good python coder, however what I think proponents _for_ this could do is use it - or create a sublanguage which is very pythonic.

If this confuses people, they will tend to avoid it in their code anyway. And if it has a solid advantage, people will use it.

Tim Finin said...

I love 'Basic of the future' as a slur/compliment for a programming language.

d0mine said...

Re Bicking's solution -- James Tauber: Thunks, Trampolines and Continuation Passing

Isaac Z. Schlueter said...

"Why can't TCO just be enabled in an optimized compile?"

The problem is that TCO affects how you go about solving problems. In Erlang or Scheme, it makes sense to use tail calls very heavily and for just about everything. So, programs that work great with TCO would not work at all without it.

If Python enabled TCO sometimes, then that would mean that your debug build wouldn't run your program (or would show astronomical memory usage), but your production program would run fine.

That's not to say that it's impossible in theory to have some amount of stack traceability, and still have TCO. But, you won't have a stack trace that goes back to the very beginning—the best you could do is to keep track of the last N frames, which would have a reasonably constant amount of overhead.

sim said...

Daniel Friedman wrote a good explanation of trampolines and how to use them to code in continuation passing style in a language without TCO.

http://www.cs.indiana.edu/~dfried/dfried/mex.pdf

quag said...

TO: people suggesting some form of TCO+debuginfo.

A small depth of tail calls works with or without TCO. A large depth of tail calls required TCO, or the code to be rewritten as a loop.

The point of TCO is to allow a deep level of tail calls.

Now consider the TCO+debuginfo situation. The cost of the debuginfo might be less than the stack frame, but it is still of the same order as for tail calls without TCO; one piece of debug info per tail call. In other words, doing tail calls with TCO+debuginfo would still "leak" memory for each tail call.

A possible solution, to allow both deep tail calls and debug stack traces would be to only keep tail call info (or full stack frames) up to a max depth, and after that discard the debug info. Given that deep deep stack traces (think thousands or millions of calls) aren't that useful anyway, discarding after a threshold would be safe right? ;-)

Lucas said...

I can't begin to imagine the "basic of the future" as a serious slur. I remember when I was 12, playing around with QBasic. Not particularly fast, maybe doesn't teach the best programming habits, but a twelve-year-old could read it and write simple programs with only a few hours of instruction. If the "slur" is true, you'll introduce millions of teenagers to programming for the first time.

Jason said...

Quite a few of us started with basic as kids, and Ill be starting my children with python so I guess that isn't all bad.

proteusguy said...

I am very wary of the "you must incorporate my favorite language feature" calls that inevitably occur in language design efforts. Just sit in on the ISO C++ meetings some day if you want to see a formalized version of this.

Good languages have a design philosophy that provide its architectural drivers. Most follow up language design is best spent fixing past mistakes rather than adding new features. Any new feature proposal should come with an impression example implementation which really sends the point home if it is to be taken seriously.

For the rest who insist their language have all their fav features I would refer you to C#. It has them all - along with the logical consequences thereof.

Aaron said...

"[TCE] would certainly confuse many users, who have not been raised with tail call religion"

Guido - thanks for the clarification. While I agree with detractors that your first post had some inaccurate 'facts', you've definitely corrected yourself here and clarified things. There's the Python way/religion, and it's different than the functional way/religion.

I'm a fan of both scheme and python, and code differently in each. They're different tools for solving similar problems in different ways (hammer/nail, screw-driver/screw), and it would be incorrect to confuse the one for the other.

... that said, I wouldn't be offended if you ever changed your mind on the TCE issue. ^_^

Martin P. Hellwig said...

I fail to see why this is such an important feature that at least a significant amount of people have an opinion about it. What kind of problem would you rather solve with tail calls, something like implementing a xml parser?

FutureNerd said...

Not an optimization, not a feature, it's a fix to a memory leak bug.

For mainstream Python, it can't be an "optimization" or option for the whole program, but why not an explicit variation on "return"?

Anyone who wants to find out about the amazing things you can do once you fix that leak, could probably get a good start with MIT AI papers called "lambda the ultimate" this or that.

A chunk of that amazement is writing new control structures, like coroutines, schedulers, tree searchers, generators, exception handlers, and debuggers, as regular functions.

ArneBab said...

From a short readup on TCO in scheme, that while loop is just how quite a few schemes implement TCO internally.