Hacker News new | ask | show | jobs
by rdtsc 4697 days ago
That is true. I think date was more of an example on how to use threads. But yeah, Beautifulsoup bit might actually run a bit slower if it is only doing parsing. (!unless there is a C extension underneath that does release the GIL!)

Any discussion of Python and threads is always confusing and it seems to me there aren't many people who understand how it works (I know you do, not talking about your comment, just in general).

In one camp you have people who like to write how Python is no good and threads are just broken. Never use them. In the other camp are people who say threads are fine, they work great, I never had any issues with them. A lot of time people in this camp are just reacting to the ones in the first camp but also without understanding the underlying mechanism.

I think the first thing that should be mentioned in any introductory article on Python threads is that Python threads work great for IO concurrency but they won't help with CPU concurrency. Things like downloading files, sending data over a socket will work nicely. Thing like computing determinants won't. You can still structure your code in a threaded fashion so can use multiprocessing module in the future but you won't get a speed up. So threads are not completely unusable and broken but they also have a surprising limitation.

Overall in Python in my career I probably deal more with IO concurrency and threads helped there quite a bit. Others will have a different experience depending on their area of expertise.

Also it is worth mentioning that libraries like Numpy and C extensions in general have the option of releasing the GIL if they want to they can get a speedup. I have done this once by hand and it did help (with a hand written extension). Didn't personally test numpy's speedup.

ADDITION:

It is also worth mentioning that even though you not get a speedup for CPU related concurrency, you still have to deal with synchronization issues. So you get the worse of both worlds. Just something to keep in mind.

2 comments

I think the discussion around Python threads revolves too much on the performance. Imho Python threads work great when used for what they are suited for: managing control flow, or making the code more readable. Like if you have a task in your app that needs to be done every 5 minutes you could just make something like:

    class ThreadClass(threading.Thread):
        def run(self):
            while True:
                do_stuff()
                sleep(5*60)
and launch it to the background.
> I think the first thing that should be mentioned in any introductory article on Python threads is that Python threads work great for IO concurrency but they won't help with CPU concurrency. Things like downloading files, sending data over a socket will work nicely.

Python's threads certainly work acceptably for IO-bound purposes, but given the overhead of creating a real OS thread and the potential for GIL thrashing when using Python 2.x on a multicore machine, I'm not sure why you wouldn't favor a greenlet-based solution in most such cases, especially since you don't even really have to drop the threading idiom to do so.

Good point. I use gevent and now switching back to eventlet. But that is a different post perhaps.

Not only do you get more threads with greenlet you also don't have to worry about a whole class of synchronization side-effects since a greenlet will only switch contexts on an IO operation. (Now some might argue that is bad since you could be calling a function and not know what happens in side or what might happen in the future so you should lock anyway).

Out of curiosity, why the switch? I went with gevent pretty early on because it seemed like eventlet had some weird quirks, but I have to admit never really giving it a close look.
Because it is supported in older Pythons on some servers I work on. Works with PyPy. It has less dependencies (just greenlet) and easier to build.

gevent also has been thrashing around is it 1.0 beta? Switching to libev or libevent. And eventlet picked up more steam with more test coverage.

So no one big issue just a bunch of small ones.