What if writing performant code on modern Python implementations is only incrementally easier than writing it in C to begin with?
With the right libraries, the hard parts of C probably turn out to be string processing with zero-copy string idioms, the requirement to lay out every data structure in fiddly detail, the requirement to track individual allocations, and the requirement to manage the memory lifecycle. What if performant Python only gives you an advantage on the last one of those?
I would say that the difference between fast Python code and C is still quite large.
- the syntax is less error-prone
- ownership semantics are much clearer. You'll never segfault because you sent some memory into the wrong function
- not as much detail is needed for memory layout, the JIT abstracts a lot of it away
- there are high-level APIs handy
- development and distribution are simpler with one less language
- the barrier to optimising things is lower
I'm interested in this discussion. Which of those issues could you dispense with using more modern APIs and idioms in C? Look at Objective C (mentally wipe off all the object goo), particular NSMutableString and NSMutableArray and NSMutableData, for examples of what I'm thinking about.
The C syntax we're stuck with. But how big a deal is that syntax?
Segfaults are mitigated if you don't expose pointers, except to the extent that C programmers have to think about memory lifecycle (like I said, I think this is indisputably a win for high level languages). Look at NSMutableString for an example of a C-style idiom that removes whole classes of pointer operations.
I dispute that JITs abstract away details about storage; they may allow you to not think about those details for code that doesn't need to be performant, and they can help the language get out of the way when you need to care about the storage details, but the question I'm asking is limited to performant code. There is no question that nonperformant Python code is way easier to write than any kind of C code!
There are better APIs available in Python than are commonly available to C or even ObjC, but that's a solvable problem. Let's stipulate better APIs, to the limit of what the language would allow (in other words, it's totally fair to say that the design of C/C++ would prohibit certain kinds of easy APIs).
Development and deployment are easier in some cases for Python (for instance, building on OS X and deploying on Linux), but far easier for C in others (for instance, building code that will run in a kernel or as a plugin in the address space of another process).
I dispute that the barrier to optimization is lower in Python for obvious reasons: C programmers can optimize without working around the exposed wires and ductwork of the language runtime. C programmers generally have an easier time optimizing than Python programmers; that is probably the #1 reason any Python programmer ever writes C.
As a current Golang programmer I agree strongly with the commenter below that when you take this idea and apply it to a new language you wind up with something that looks a lot like Go, which does work great. But I'm not advocating Go here.
I would say that the things you mentioned that we shouldn't count already add up to a lot (syntax, memory management, segfaults, security vulnerabilities). Another big one is that if you have these additional constructs in Python you can smoothly migrate from slow to fast code. You don't have to create a C file, rewrite your whole algorithm, create a build process to compile the C file (which you don't need with Python), and get the C functions to be callable from Python. In contrast with the method proposed in this PyPy presentation you just change a couple of lines. If instead of advocating writing just the performance critical parts in C you are advocating writing everything in C, then in addition to the issues you mentioned then you're missing the high level features of Python for the code that isn't performance critical.
The woes of pointers (segfaults and security vulnerabilities) cannot be addressed in a library without a performance penalty. If you want a nice error message instead of a segfault or random memory overwrite you will have to pass around type information at run time. You could however have a production version of the stdlib that did not pass around type information, but that would only solve the issue at development time: the security vulnerabilities in production would still be there.
There is also an argument to be made that many of the optimizations mentioned in the presentation can be done automatically by the compiler/JIT. For example Javascript JITs already optimize small hash tables used as objects, since every Javascript object is a hash table. Load forwarding followed by code motion can remove unnecessary intermediate allocations. And the square example should have been written as:
[i*i for in in xrange(n)]
This can allocate the result list of the right size at the start of the allocation.
I routinely program in Python and C, and syntax matters much to me.
My personal favorite feature of Python is simply the syntatic sugar that allows me to write stuff like "for element in array" without having to remember that an index exists. These little things add up fast when you're trying to focus on the problem at hand!
I actually was able to write a macro in C that, along with a certain paradigm for defining collections, allows foreach loops that are just as nice as Python.
The crux of the argument seems to come down to trading on optimised development time versus optimised execution time. C with the right set of APIs could nail both of those, which is no slight to Python. Look at BIND, they've gone mixed C++/Python because they know you use the right language when you need it, and don't get religious.
You can control the details without it having to be overly fiddly. Certainly you can do better than C. You may end up doing similar things, but your code will be way easier to write and read. Go does this quite well.
With the right libraries, the hard parts of C probably turn out to be string processing with zero-copy string idioms, the requirement to lay out every data structure in fiddly detail, the requirement to track individual allocations, and the requirement to manage the memory lifecycle. What if performant Python only gives you an advantage on the last one of those?