It's not an unresolved question whether idiomatic Python is slower than idiomatic C/C++ for solving comparable problems. Python is much, much slower than C.
This is completely true. It is indeed well known and common wisdom.
However, I think the point the parent was trying to make is: Python is much slower than C and many other languages, however most of the time speed is unimportant. When it becomes important, there are many technologies to mitigate the problem in your "hot loops."
If speed is your primary concern don't use Python et. al. If it isn't your main concern go ahead it probably won't become an issue and if it does you probably will be able to get around it.
I think just saying don't use python where speed is important is missing the whole point of this slideshow. The author is saying, hey, if we thought about this and put in a few different features we could make it a lot faster without programmers having to do much other than use an alternate implementation of some data stuctures.
While I'd agree that for 99% of us we're not going to find python/ruby/php/javascript to be a bottleneck that can't be mitigated, that's no reason to say it's not worth trying to make them faster. If we can make changes to these languages that will make them more efficient, why not do it?
When it becomes important, there are many technologies to mitigate the problem in your "hot loops."
There is an implicit assumption there that most of the time in your program that could be saved is spent in a small number of hot spots. This will often be true, but unfortunately it is not necessarily so.
This is a particular problem in languages like Python, which are useful (among other things) for their support for rapid prototyping and their easily readable code. All of that is lost if you can’t perform local optimizations to reach an acceptable level of performance, leaving a ground-up rewrite in a faster language like C as the next most likely strategy.
The kinds of techniques mentioned in the linked slides could help to create a middle ground that would be very useful for performance-sensitive projects that currently find themselves between a rock and a hard place.
>Python is much slower than C and many other languages, however most of the time speed is unimportant. When it becomes important, there are many technologies to mitigate the problem in your "hot loops."
I'm not convinced by this "speed is unimportant".
Well, if you're writing shell scripts in Python/Ruby etc, OK, it might be. It might not even be important in web programming.
But for using any language as a generic programming language speed is very important.
The reason you cannot build full blown desktop apps like a browser or GUI libraries in Python? Lack of speed and memory control. And yes, you could offload the work to some extension. And that's a barrier.
Suddenly knowing Python is not enough. You got to also learn, e.g, C, and you have a segmented program structure, with some stuff here and some stuff there. Or you relegate Python to just the scripting layer for your program and do the real stuff in C/C++ (like Adobe Lightroom uses Lua).
I don't want to mitigate the problem in my "hot loops" with another language. I want to not have that problem in the first place. That would make me more productive.
One example: imagine NumPy in pure Python.
For one, it would be trivial to include in your project. Without building anything, it would work in all platforms.
Second, it would be far more accessible to people that don't know C/Fortran/et al to hack it.
Third, it would have been available for Python 3 or PyPy in a few months, not after several years.
Alright. Now, another way of achieving better speed is parallelism. But due to the bad support for it (GIL, lack of first class support) it's not easy to achieve this in CPython/MRI. Sure, you could use multiple processes but then you get all the issues of handling them and synchronising them with your own ad-hoc solution, and without first-class support from the language. Which is a barrier.
Yet another way to get more work done --for some kind of programs-- is evented code. So you have something like Node or Twisted. But Node doesn't have language support, so you get the "callback spaghetti" and Twister and co are external dependencies to the language, so they add another overhead.
Again, barriers.
People say "Speed doesn't matter" because they are trained by their language to only work on problems where speed doesn't matter. So it's more like a self-fulfilling prophecy.
Or course, if you constrain yourselves in "convenient" domains that your language supports fast enough, speed doesn't matter. But every step out of this and you are in need of clutches, from C extensions, to Cython, to Psyco, to Numpy, etc.
I was responding to @tptacek criticism of the parent not the deck. The deck is great and it mirrors the wisdom I have picked up from optimizing my own code over the years. I personally find it really frustrating not being able to easily pre-alloc lists in Python. I think that having better APIs would go a long way.
That will create a list of 100 instances of the same object.
object[0].x = 1
print object[1].x
> 1
Edit: On second read, it looks like you're asking something other than what I thought you were asking. Yes, you could create a list of 100 items and then replace its elements, but that's not idiomatic.
but it is idiomatic in C, which is the point of the slide. C was built around a performance focused idiom, which is to pre-allocate memory and then do in place writes and swaps to mutate the buffer to the state you need it to be. Python is built around an idiom of largely creating copies of objects and appending them to dynamically allocated lists. Its a much slower idiom.
> It's not an unresolved question whether idiomatic Python is slower than idiomatic C/C++ for solving comparable problems. Python is much, much slower than C.
The real question is does it matter for a particular project.
If it is a desktop GUI. Does it matter if you write it in C++ and the time from button click to status update is 5usec or 1msec?
If you are receiving 10 messages per second, parsing out json and sending back a response or saving it to disk, does it always matter that it all happens in 10msec instead of 11msec. Maybe it does, I found it often doesn't.
[Edit: the parent originally had a sentence about not understanding why people like Python for Scientific Computing. This was my response to that. The parent has now removed the sentence.]
We (the people using Python for Scientific Computing) like Python for the following reasons:
1. Numpy+Scipy+matplotlib+cvxopt is a very speedy environment. Its only real competitor for what it provides is MatLab. I have a colleague who bench marked Python vs. Matlab for our workload. Python is faster. (often because some of the algorithms used are newer than the equivalents in Matlab.)
2. It is a very productive environment. We do a lot of evolutionary changes and prototyping. Doing in this in C would slow us down in dev. time. This is academic work and mostly the code isn't important the analysis is.
3. We generally know where the "hot loops" are. Which is what we focus on for optimization. This generally involves doing math on paper. Then implementing it. If you turn loops in to matrix multiplications and use a good matrix library you get a great speed up.
Sorry, I decided that it wasn't important before you replied. I am genuinely interested in why people use python for scientific computing, tho.
I have a colleague who bench marked Python vs. Matlab for our workload. Python is faster
Is it also faster than C? From my limited experience, it seems that people sometimes spend a lot of time on concurrency when faster code would have been easier.
This generally involves doing math on paper. Then implementing it.
Ah, yes, math always wins. This reinforces your point #2.
So, is #2 that much of a win? Do scientific programs spend more time in "development" than "production"?
> Is it also faster than C? From my limited experience, it seems that people sometimes spend a lot of time on concurrency when faster code would have been easier
It can reach FORTRAN speeds with the right tools. With Numba (http://numba.pydata.org/), your pure Python code gets compiled down to optimized machine code at call time, if your arguments are Numpy arrays. With NumbaPro (https://store.continuum.io/cshop/numbapro), we automatically parallelize for multi-core CPUs, and we emit CUDA/PTX for GPUs, and automatically exploit the parallelism in your data and algorithm.
The reason "higher level languages" can be faster than lower-level ones is because the compiler has more information about data parallelism. Typically "low level languages" are lower in that their type primitives are smaller, and hence the algorithms around those have turned vectorizable arrays into opaque for loops over arbitrary loop variables.
I certainly agree with you that many people now reach for distributed and parallel while leaving a lot of single-core and single-node performance on the table, mostly by ignoring the realities of memory bandwidth on modern CPUs. However, that level of efficiency is well within the reach of the Scientific Python stack. (See this blog post for how we're building a persistence format that respects memory hierarchy: continuum.io/blog/blz-format)
As a counterpoint; a last project I wrote in college was a machine learning algorithm. By a rough comparison it was on the order of 10000 times faster in C++ than the preexisting matlab implementation. The cause was that the performance bottleneck was not in large matrix operations; instead, there were lots of iterative updates until convergence; this meant small vectors; a C++ template-based matrix library such as Eigen ends up inlining almost all of it into one no-allocation dense bit of math the traditional optimizer can milk for every last bit.
And it's not just about static/dynamic language differences here: practically, JIT might even do better by specializing the algorithm for a particular dimensionality, whereas that's impractical in C++ since you don't know the dimensionality until runtime.
Now, sometimes you can reduce your algorithm to some large-scale eigenvalue decomposition or whatever, and then numpy or similar might provide reasonable performance. But it's not a very general solution because performance on small structures is terrible (and iterative simple updates are common in many algorithms). JITted code relying on some underlying native library (like numpy) could never extract reasonable performance from this type of code; it would be forced to make many, many function calls in the innermost loop.
Good points, certainly, but just to clarify: Numba is not particularly dependent on Numpy's built-in vectorized and matrix operations. Instead, it's using the datatype information to do JIT type inference over the functions being called with the matrix/array arguments, and building machine code for them. You can call Numba JITted functions from other Numba JITted functions, and the overhead is the same as C functions calling each other.
There is no "production" in scientific programs. It runs once correctly to make the figure... more seriously, ontology is often a moving target, so the longer in takes to rewrite significant parts of the data structures, the less time there is to do science.
re: concurrency: I have a script that boots hundreds of IPython workers on hundreds of cores. I then make a client object (in antoher IPython shell), and map my 1e8 parameter configurations on to the cores, all in under a minute. This is much faster than rewritng in C.
I even implemented a special case of the brain simulator we've developed in Python (http://thevirtualbrain.org/) in C w/ unaliased pointer arithmetic etc. It's 50% faster but took more than 50% longer to write; on the other hand the PyCUDA implementation is 80x faster, and didn't take 80x, maybe 10x. Also a win because PyCUDA takes care of the uglier details.
It's 50% faster but took more than 50% longer to write
At this point it's useful to know how long it takes to run, and how long to write. Is a run days long, months long, or years long? Or another way, is concurrency more expensive than a C re-programmer?
Also a win because PyCUDA takes care of the uglier details.
Is there not an analogous C++ library to take care of ugly details?
(I actually like python a lot, so there's a bit of devil's advocate going on. But, my longest running python programs take less than an hour.)
Typical simulations for us take between half a minute and several days, but this can depend because it's typically necessary to do a parameter sweep in several dimensions (leading in extreme cases to runtimes of several months on a cluster).
I believe Thrift (now shipped w/ CUDA SDK) makes things easier, but (since you know Python) nothing like NumPy exists in C++ and PyCUDA maps NumPy seamlessly into GPU computing, which is a big win.
> Is there not an analogous C++ library to take care of ugly details?
No. In general, there isn't an analogous library at the more static-explicit languages (it doesn't matter much what library you choose). There are libs that people use when they have similar requisites, but they rarely are analogous.
Does that mean we can't discuss what makes languages or their implementations performant without having a detailed conversation about the relevance of performance?
No, I'm in complete agreement with the OP, but you said
Python is slower than idiomatic C/C++ for solving comparable problems
And when io and especially network is involved, that is not true. Your efficient C code can't make up for time lost elsewhere in the system. No one is clamoring for curl to be rewritten in assembly.
Is this really true? I have investigated a few performance / power problems caused by somebody using a bad networking API or using a networking API correctly, despite doing the same amount of I/O.
It is also more likely to be possible to use efficient platform-specific APIs for things like zero-copy I/O in C than in a scripting language.
Zero-copy is often not actually a win; it's also uncommon in C code, too. The parent comment is right; I/O bound programs tend to do just as well in slow languages as in fast. It's true that language performance often doesn't matter, just like it's true that parser designs don't matter if you just use sexprs for everything.
However, I think the point the parent was trying to make is: Python is much slower than C and many other languages, however most of the time speed is unimportant. When it becomes important, there are many technologies to mitigate the problem in your "hot loops."
If speed is your primary concern don't use Python et. al. If it isn't your main concern go ahead it probably won't become an issue and if it does you probably will be able to get around it.