Hacker News new | ask | show | jobs
by njharman 4860 days ago
Meh, MEH.

I'm almost never waiting on my python code. I'm waiting on network or disk or database or joe to check in his changes or etc.

I'm sure there are people who do wait. But that's why numpy, c extensions, all the pypy, psycho, and similar things exist.

Python and more broadly "scripting" languages are for speed of development. Something else can take on speed of execution faster than 90% of people need it to be.

5 comments

It's not an unresolved question whether idiomatic Python is slower than idiomatic C/C++ for solving comparable problems. Python is much, much slower than C.
This is completely true. It is indeed well known and common wisdom.

However, I think the point the parent was trying to make is: Python is much slower than C and many other languages, however most of the time speed is unimportant. When it becomes important, there are many technologies to mitigate the problem in your "hot loops."

If speed is your primary concern don't use Python et. al. If it isn't your main concern go ahead it probably won't become an issue and if it does you probably will be able to get around it.

I think just saying don't use python where speed is important is missing the whole point of this slideshow. The author is saying, hey, if we thought about this and put in a few different features we could make it a lot faster without programmers having to do much other than use an alternate implementation of some data stuctures.

While I'd agree that for 99% of us we're not going to find python/ruby/php/javascript to be a bottleneck that can't be mitigated, that's no reason to say it's not worth trying to make them faster. If we can make changes to these languages that will make them more efficient, why not do it?

When it becomes important, there are many technologies to mitigate the problem in your "hot loops."

There is an implicit assumption there that most of the time in your program that could be saved is spent in a small number of hot spots. This will often be true, but unfortunately it is not necessarily so.

This is a particular problem in languages like Python, which are useful (among other things) for their support for rapid prototyping and their easily readable code. All of that is lost if you can’t perform local optimizations to reach an acceptable level of performance, leaving a ground-up rewrite in a faster language like C as the next most likely strategy.

The kinds of techniques mentioned in the linked slides could help to create a middle ground that would be very useful for performance-sensitive projects that currently find themselves between a rock and a hard place.

>Python is much slower than C and many other languages, however most of the time speed is unimportant. When it becomes important, there are many technologies to mitigate the problem in your "hot loops."

I'm not convinced by this "speed is unimportant".

Well, if you're writing shell scripts in Python/Ruby etc, OK, it might be. It might not even be important in web programming.

But for using any language as a generic programming language speed is very important.

The reason you cannot build full blown desktop apps like a browser or GUI libraries in Python? Lack of speed and memory control. And yes, you could offload the work to some extension. And that's a barrier.

Suddenly knowing Python is not enough. You got to also learn, e.g, C, and you have a segmented program structure, with some stuff here and some stuff there. Or you relegate Python to just the scripting layer for your program and do the real stuff in C/C++ (like Adobe Lightroom uses Lua).

I don't want to mitigate the problem in my "hot loops" with another language. I want to not have that problem in the first place. That would make me more productive.

One example: imagine NumPy in pure Python.

For one, it would be trivial to include in your project. Without building anything, it would work in all platforms.

Second, it would be far more accessible to people that don't know C/Fortran/et al to hack it.

Third, it would have been available for Python 3 or PyPy in a few months, not after several years.

Alright. Now, another way of achieving better speed is parallelism. But due to the bad support for it (GIL, lack of first class support) it's not easy to achieve this in CPython/MRI. Sure, you could use multiple processes but then you get all the issues of handling them and synchronising them with your own ad-hoc solution, and without first-class support from the language. Which is a barrier.

Yet another way to get more work done --for some kind of programs-- is evented code. So you have something like Node or Twisted. But Node doesn't have language support, so you get the "callback spaghetti" and Twister and co are external dependencies to the language, so they add another overhead.

Again, barriers.

People say "Speed doesn't matter" because they are trained by their language to only work on problems where speed doesn't matter. So it's more like a self-fulfilling prophecy.

Or course, if you constrain yourselves in "convenient" domains that your language supports fast enough, speed doesn't matter. But every step out of this and you are in need of clutches, from C extensions, to Cython, to Psyco, to Numpy, etc.

The deck is about the performance of the language.
@chadcf and @tptacek

I was responding to @tptacek criticism of the parent not the deck. The deck is great and it mirrors the wisdom I have picked up from optimizing my own code over the years. I personally find it really frustrating not being able to easily pre-alloc lists in Python. I think that having better APIs would go a long way.

As the deck says:

"Line for line these languages are fast!"

"We need better no-copy/preallocate APIs"

"Take care in data structures"

Forgive the naive question, but why not:

    l = [object()] * 100
Perhaps the difference is stack vs. heap?
That will create a list of 100 instances of the same object.

  object[0].x = 1
  print object[1].x
  > 1
Edit: On second read, it looks like you're asking something other than what I thought you were asking. Yes, you could create a list of 100 items and then replace its elements, but that's not idiomatic.
And this comment thread is about why some people don't care in some applications. HN meta!
This comment thread is isomorphic to a comment thread on Packrat parsing with comments about how "I don't parse anything I just use sexprs".
> It's not an unresolved question whether idiomatic Python is slower than idiomatic C/C++ for solving comparable problems. Python is much, much slower than C.

The real question is does it matter for a particular project.

If it is a desktop GUI. Does it matter if you write it in C++ and the time from button click to status update is 5usec or 1msec?

If you are receiving 10 messages per second, parsing out json and sending back a response or saving it to disk, does it always matter that it all happens in 10msec instead of 11msec. Maybe it does, I found it often doesn't.

Yes battery power and general speed. On the other hand, it's hard to sacrifice more/better programs over speed; but IMO it does matter.
Meh, when there's an io call or a network request in front of the computation you'll never know.

EDIT: removed an additional comment about scientific computing that is now relevant as someone replied to it.

[Edit: the parent originally had a sentence about not understanding why people like Python for Scientific Computing. This was my response to that. The parent has now removed the sentence.]

We (the people using Python for Scientific Computing) like Python for the following reasons:

1. Numpy+Scipy+matplotlib+cvxopt is a very speedy environment. Its only real competitor for what it provides is MatLab. I have a colleague who bench marked Python vs. Matlab for our workload. Python is faster. (often because some of the algorithms used are newer than the equivalents in Matlab.)

2. It is a very productive environment. We do a lot of evolutionary changes and prototyping. Doing in this in C would slow us down in dev. time. This is academic work and mostly the code isn't important the analysis is.

3. We generally know where the "hot loops" are. Which is what we focus on for optimization. This generally involves doing math on paper. Then implementing it. If you turn loops in to matrix multiplications and use a good matrix library you get a great speed up.

Sorry, I decided that it wasn't important before you replied. I am genuinely interested in why people use python for scientific computing, tho.

I have a colleague who bench marked Python vs. Matlab for our workload. Python is faster

Is it also faster than C? From my limited experience, it seems that people sometimes spend a lot of time on concurrency when faster code would have been easier.

This generally involves doing math on paper. Then implementing it.

Ah, yes, math always wins. This reinforces your point #2.

So, is #2 that much of a win? Do scientific programs spend more time in "development" than "production"?

> Is it also faster than C? From my limited experience, it seems that people sometimes spend a lot of time on concurrency when faster code would have been easier

It can reach FORTRAN speeds with the right tools. With Numba (http://numba.pydata.org/), your pure Python code gets compiled down to optimized machine code at call time, if your arguments are Numpy arrays. With NumbaPro (https://store.continuum.io/cshop/numbapro), we automatically parallelize for multi-core CPUs, and we emit CUDA/PTX for GPUs, and automatically exploit the parallelism in your data and algorithm.

The reason "higher level languages" can be faster than lower-level ones is because the compiler has more information about data parallelism. Typically "low level languages" are lower in that their type primitives are smaller, and hence the algorithms around those have turned vectorizable arrays into opaque for loops over arbitrary loop variables.

I certainly agree with you that many people now reach for distributed and parallel while leaving a lot of single-core and single-node performance on the table, mostly by ignoring the realities of memory bandwidth on modern CPUs. However, that level of efficiency is well within the reach of the Scientific Python stack. (See this blog post for how we're building a persistence format that respects memory hierarchy: continuum.io/blog/blz-format)

As a counterpoint; a last project I wrote in college was a machine learning algorithm. By a rough comparison it was on the order of 10000 times faster in C++ than the preexisting matlab implementation. The cause was that the performance bottleneck was not in large matrix operations; instead, there were lots of iterative updates until convergence; this meant small vectors; a C++ template-based matrix library such as Eigen ends up inlining almost all of it into one no-allocation dense bit of math the traditional optimizer can milk for every last bit.

And it's not just about static/dynamic language differences here: practically, JIT might even do better by specializing the algorithm for a particular dimensionality, whereas that's impractical in C++ since you don't know the dimensionality until runtime.

Now, sometimes you can reduce your algorithm to some large-scale eigenvalue decomposition or whatever, and then numpy or similar might provide reasonable performance. But it's not a very general solution because performance on small structures is terrible (and iterative simple updates are common in many algorithms). JITted code relying on some underlying native library (like numpy) could never extract reasonable performance from this type of code; it would be forced to make many, many function calls in the innermost loop.

There is no "production" in scientific programs. It runs once correctly to make the figure... more seriously, ontology is often a moving target, so the longer in takes to rewrite significant parts of the data structures, the less time there is to do science.

re: concurrency: I have a script that boots hundreds of IPython workers on hundreds of cores. I then make a client object (in antoher IPython shell), and map my 1e8 parameter configurations on to the cores, all in under a minute. This is much faster than rewritng in C.

I even implemented a special case of the brain simulator we've developed in Python (http://thevirtualbrain.org/) in C w/ unaliased pointer arithmetic etc. It's 50% faster but took more than 50% longer to write; on the other hand the PyCUDA implementation is 80x faster, and didn't take 80x, maybe 10x. Also a win because PyCUDA takes care of the uglier details.

so #2 is a big win

It's 50% faster but took more than 50% longer to write

At this point it's useful to know how long it takes to run, and how long to write. Is a run days long, months long, or years long? Or another way, is concurrency more expensive than a C re-programmer?

Also a win because PyCUDA takes care of the uglier details.

Is there not an analogous C++ library to take care of ugly details?

(I actually like python a lot, so there's a bit of devil's advocate going on. But, my longest running python programs take less than an hour.)

Does that mean we can't discuss what makes languages or their implementations performant without having a detailed conversation about the relevance of performance?
No, I'm in complete agreement with the OP, but you said

Python is slower than idiomatic C/C++ for solving comparable problems

And when io and especially network is involved, that is not true. Your efficient C code can't make up for time lost elsewhere in the system. No one is clamoring for curl to be rewritten in assembly.

Is this really true? I have investigated a few performance / power problems caused by somebody using a bad networking API or using a networking API correctly, despite doing the same amount of I/O.

It is also more likely to be possible to use efficient platform-specific APIs for things like zero-copy I/O in C than in a scripting language.

Zero-copy is often not actually a win; it's also uncommon in C code, too. The parent comment is right; I/O bound programs tend to do just as well in slow languages as in fast. It's true that language performance often doesn't matter, just like it's true that parser designs don't matter if you just use sexprs for everything.
True, but obvious.
I hate to feed someone on a troll, but you started this off with "Python is much, much slower than C." This comment thread is sponsored by obvious.
Speed in Python (or Ruby, or JS) isn't a big deal... until it is. When that happens, would you rather have to switch over to C and glue the resulting binary in (assuming you're not using JS, in which case you're just SOL), or would you rather have a high performance API at your fingertips for optimization when you need it?
Well, my usual answer there is to change the file extension to .pyx and see what Cython can do with a few type annotations. Usually the results are pretty good, and sometimes they're very good.
I think most of the gains you'd get from that would be orthogonal to the gains you would get from giving the JIT a little more information about allocations.
Does Cython give you much better results than PyPy? I would have thought that if just a few type annotations make a big difference tracing in PyPy could figure them out.
I don't know. We're using a bunch of C extensions that aren't trivially compatible with PyPy, so using it isn't really an option -- which is a pity, since PyPy sounds pretty amazing. Cython, on the other hand, integrates really well with CPython, and it can be as fast as C if you need it to be. I'm pretty happy with it.
If you're completely satisfied with Python's performance, that's wonderful. It means the talk wasn't aimed at you. Move along.
>Meh, MEH. I'm almost never waiting on my python code. I'm waiting on network or disk or database or joe to check in his changes or etc.

Meh, MEH. That's because you don't do anything involved with your Python code.

>I'm sure there are people who do wait. But that's why numpy, c extensions, all the pypy, psycho, and similar things exist.

That they HAVE to exist could also be considered a sad state of affairs though. With a faster language you would just use the language, not external extensions and tricks.

Anything involved with what? What kind of specific task do you actually mean by 'involved'?

If you don't mind leaving Python's advantages on the table then use C in good health. Odds are that other people will be waiting on you to produce the C code, so let's hope you actually needed to do that.

Alex's point is that Python on PyPy is trivially comparable to those C extensions in speed. So why give up Python, ever, if JITs are this good?
What if writing performant code on modern Python implementations is only incrementally easier than writing it in C to begin with?

With the right libraries, the hard parts of C probably turn out to be string processing with zero-copy string idioms, the requirement to lay out every data structure in fiddly detail, the requirement to track individual allocations, and the requirement to manage the memory lifecycle. What if performant Python only gives you an advantage on the last one of those?

I would say that the difference between fast Python code and C is still quite large.

- the syntax is less error-prone - ownership semantics are much clearer. You'll never segfault because you sent some memory into the wrong function - not as much detail is needed for memory layout, the JIT abstracts a lot of it away - there are high-level APIs handy - development and distribution are simpler with one less language - the barrier to optimising things is lower

I'm interested in this discussion. Which of those issues could you dispense with using more modern APIs and idioms in C? Look at Objective C (mentally wipe off all the object goo), particular NSMutableString and NSMutableArray and NSMutableData, for examples of what I'm thinking about.

The C syntax we're stuck with. But how big a deal is that syntax?

Segfaults are mitigated if you don't expose pointers, except to the extent that C programmers have to think about memory lifecycle (like I said, I think this is indisputably a win for high level languages). Look at NSMutableString for an example of a C-style idiom that removes whole classes of pointer operations.

I dispute that JITs abstract away details about storage; they may allow you to not think about those details for code that doesn't need to be performant, and they can help the language get out of the way when you need to care about the storage details, but the question I'm asking is limited to performant code. There is no question that nonperformant Python code is way easier to write than any kind of C code!

There are better APIs available in Python than are commonly available to C or even ObjC, but that's a solvable problem. Let's stipulate better APIs, to the limit of what the language would allow (in other words, it's totally fair to say that the design of C/C++ would prohibit certain kinds of easy APIs).

Development and deployment are easier in some cases for Python (for instance, building on OS X and deploying on Linux), but far easier for C in others (for instance, building code that will run in a kernel or as a plugin in the address space of another process).

I dispute that the barrier to optimization is lower in Python for obvious reasons: C programmers can optimize without working around the exposed wires and ductwork of the language runtime. C programmers generally have an easier time optimizing than Python programmers; that is probably the #1 reason any Python programmer ever writes C.

As a current Golang programmer I agree strongly with the commenter below that when you take this idea and apply it to a new language you wind up with something that looks a lot like Go, which does work great. But I'm not advocating Go here.

I would say that the things you mentioned that we shouldn't count already add up to a lot (syntax, memory management, segfaults, security vulnerabilities). Another big one is that if you have these additional constructs in Python you can smoothly migrate from slow to fast code. You don't have to create a C file, rewrite your whole algorithm, create a build process to compile the C file (which you don't need with Python), and get the C functions to be callable from Python. In contrast with the method proposed in this PyPy presentation you just change a couple of lines. If instead of advocating writing just the performance critical parts in C you are advocating writing everything in C, then in addition to the issues you mentioned then you're missing the high level features of Python for the code that isn't performance critical.

The woes of pointers (segfaults and security vulnerabilities) cannot be addressed in a library without a performance penalty. If you want a nice error message instead of a segfault or random memory overwrite you will have to pass around type information at run time. You could however have a production version of the stdlib that did not pass around type information, but that would only solve the issue at development time: the security vulnerabilities in production would still be there.

There is also an argument to be made that many of the optimizations mentioned in the presentation can be done automatically by the compiler/JIT. For example Javascript JITs already optimize small hash tables used as objects, since every Javascript object is a hash table. Load forwarding followed by code motion can remove unnecessary intermediate allocations. And the square example should have been written as:

    [i*i for in in xrange(n)]
This can allocate the result list of the right size at the start of the allocation.
I routinely program in Python and C, and syntax matters much to me.

My personal favorite feature of Python is simply the syntatic sugar that allows me to write stuff like "for element in array" without having to remember that an index exists. These little things add up fast when you're trying to focus on the problem at hand!

I actually was able to write a macro in C that, along with a certain paradigm for defining collections, allows foreach loops that are just as nice as Python.
The crux of the argument seems to come down to trading on optimised development time versus optimised execution time. C with the right set of APIs could nail both of those, which is no slight to Python. Look at BIND, they've gone mixed C++/Python because they know you use the right language when you need it, and don't get religious.
You can control the details without it having to be overly fiddly. Certainly you can do better than C. You may end up doing similar things, but your code will be way easier to write and read. Go does this quite well.