Hacker News new | ask | show | jobs
by exDM69 4858 days ago
> Generally, you should only use threads if the following is true: - Sharing memory between threads is not an issue.

Here's the problem. Threads are really useful only if you can share memory between threads. If you can't share memory, you're usually better off using many processes.

Threads in Python (ie. CPython) can still be useful for I/O multiplexing or executing native code in background worker threads via FFI and releasing the GIL while doing so. For I/O multiplexing, there are better options than Python threads (select/poll/kqueue/epoll system calls and frameworks like twisted that use them).

In most applications, threads probably should not be used in CPython/CRuby code as they provide little performance gain compared to the complexity and overhead they add.

2 comments

http://en.wikipedia.org/wiki/Communicating_sequential_proces... works well passing immutable object graphs back and forth. Passing by value (copying everything) has a cost, and serializing everything down to byte streams across a pipe is even more expensive, especially if you don't know which portions of the object graph will and won't be needed for a given call (an optimization which imposes tight coupling on details about the code you're calling). If I'm not calling untrusted code, and not planning to divide the work across many machines, I'd prefer to avoid needless process boundaries.
Thank you. I would even go so far as to say that except in simple cases (downloading 10,000 images goes much faster with 100 worker threads than serially - which is I think the origin of "dont share memory) I would say do not use Python - or any other similar language.

Got parallel needs at your core? Look at Erlang or Haskell. If parallel or distributed work is mission critical, go with a language that has such things at its very soul. Python is a great language, but it is being enthusiastically bent to do things it is not top of the class for.

Want to handle more concurrent connections per python web server? If WSGI in Gunicorn is not enough, stop trying and use a load balancer to spread work between more servers.

You are technically correct. However, there's at least one invalid assumption at the core of this, which is that people who need to do things that fall into the there's-a-better-language-for-this category always have the opportunity to learn and implement a more appropriate tool.

This is almost always the case on commercial projects. Extremely few companies and clients will be perfectly fine with "yes, I'm a Python expert, but this would be best done in Erlang; I will need an extra week to research, learn, and implement this on top of the month the project would otherwise take." In most situations you either do it the way you know how to do it, eat the extra time (not practical in most cases), or you lose the contract/job.

Of course this is specific to client work, but I think most of us are likely doing that or something similarly limiting for at least half our waking hours, making it fairly relevant when considering ideas like "using tool X for job A is not a good idea when tool Y exists." It's correct but ignores too many practical situations to be very useful advice.

I used to think that - bit I now believe we can find the clients who want it done right more than right now

To be fair the best way of judging this is the reverse penalty clause - so this job must be done by June 1. Ok and if it is three weeks late as we use erlang? A penalty of 1000 dollars a day? Wow - ok so if I am a month early you can pay a bonus of 20,000 ? No - so perhaps we are not as time critical as we feared ? Would you rather save 20,000 in ongoing maintence costs and general uncertainty over how good the solution is for three weeks delay that would likely creep in anyway?

Have I told you erlang has an uptime of 99.999 % proven over twenty years?