Hacker News new | ask | show | jobs
by zachbeane 5734 days ago
I sometimes see people ask about translating a language like Python to Common Lisp (or another language that can be compiled) as a kind of optimization.

The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation. To do what the programmer requests requires doing things at runtime that are hard to make fast. That's why things like tracing JITs are being used for things like JavaScript.

The speedup you get from actually compiling Python programs is because the CPython interpreter is pretty awful, not because compilation is a magic solution to performance problems. The IronPython guy gave a nice explanation of this at OOPSLA 2007's Dynamic Languages Symposium - maybe things have changed in CPython since then.

2 comments

> The problem, in general, isn't that Python and languages like it don't have a compiler, it's that the semantics of the language are hostile to good performance by traditional means of compilation.

You are correct, but this approach (using libpython) is probably as good as you can do for static compilation. I did my PhD on a very similar compiler, just for PHP (phc - http://phpcompiler.org). Something like 90% of variables had known static types (and that excludes results of arithmetic which could be either reals or ints).

The best approach would be a hybrid. Throw as much static stuff at it as you can, then encode it for a JIT to use later. That's what I'm planning when I (eventually) get round to writing my language.

They however got a small, just 50% speedup, which doesn't sound like any static types were really used. They'd be able to get something like 100 times faster FPU calculations, for example if they'd knew the variable is only used as the number and nothing else. I know that factor since I measured the overhead of CPython in FPU calculations, and you can also see http://shootout.alioth.debian.org/u32q/benchmark.php?test=al....

I guess they call everything that would be called for each variable reference in CPython (including checks like "is this a string or a number or...") and that they save more or less just in having the calls encoded one after another, and maybe some internal arguments needed for that, but not in knowing the types.

Compilation also helps distribution, though. Distributing a single compiled binary for a particular platform is a lot easier than telling people they need to have some particular version of Python and libraries installed.
I hear that idea put forth sometimes. When was the last time you downloaded a single compiled binary from someone? I don't think I've ever done that, except maybe for darcs.
Example: the sysinternals tools for windows were originally distributed as single compiled binaries, and this was far and away the best approach for those tools. Also, just because developers used installers on windows doesn't mean some of them weren't making a boneheaded design decision to do it that way and their app would have been more sensibly distributed as a single binary.

A single, statically-linked binary was (and probably still is, though .NET may be better now) the most portable way to distribute a program on windows. You can't expect perl to be there, you can't expect python or ruby to be there. Java was usually there but not trustworthy. Java version conflicts on windows are a nightmare. That's probably improved unless you're a business running legacy apps. Dlls were difficult to manage and library conflicts were common: http://en.wikipedia.org/wiki/DLL_hell.

Yes, you could use an installer, but that was unnecessary overhead if your application was not sufficiently complex. On windows, you want to be able to say "download this, click on it, and when the gui pops up: have fun."

This is not true on Unix. On unix, a simple perl script is likely to be far more portable than a single binary. Python and ruby are reliable as well. C source code is as portable (or more portable in some cases) than any of the above, though installation is more complex. Also, many of the utilities that would be convenient as downloadable binaries come standard with the OS (grep, find, awk, vi). Unix package managers are powerful and useful, and developers can trust that typical unix users understand them well enough to use them.

Downloading compiled binaries from someone is what you usually do on Mac and Windows.
I've never done that; I always see either dmgs (on Mac) or installers of some sort (on Mac and Windows). The thing you download is a single file, but they expand into a lot of files. The Mac makes it look nicer with the .app idea, but you're still getting a bunch of files in there, one of which might be an interpreter for some of the other files.
You think an archive doesn't contain compiled binaries simply because it contains more than one file?
I think compilation to a single binary is not a big advantage when distributing to others. I think that because I've rarely seen anyone distribute software that way. If it's a widespread practice in some circles, I'd like to know more about the circumstances.
but at that point who cares if it's compiled or not?
Though you could just bundle your program up with the interpreter and all the libraries to make this work. No translation to binary necessary.
Yes, you could do that. Whether that's the better strategy is an exercise left to the implementer.