Hacker News new | ask | show | jobs
by scott_s 4858 days ago
You are not looking for the best optimized performance since threads share memory within a process.

That is a non-sequitur to me. The first half I'm on board with: generally, you use threads to improve performance, but because of the GIL in Python, you may not get the parallelism you want. If you're calling into libraries that don't hold the GIL, then great, but that means you have to be very aware of what's going on below you.

The second half does not follow, though. Typically, that threads share the same address space is the entire reason we use threads over processes. And the reason comes from improved performance: if the thread share an address space, you don't need to copy the data. Copying data is expensive. (It also means you're susceptible to a whole host of synchronization bugs.)

1 comments

Sharing all data between threads means you're susceptible to a whole host of synchronization bugs (in the sense of thread synchronization, not data synchronization). Unless you use synchronization primitives like locks to protect the shared data, which can also easily kill concurrency. It's a trade-off.

If avoiding copying is not a top problem, then you may be wasting your time; there's nothing wrong with using abstractions more appropriate to your environment.

If the program scales out, it should be less important to micro-optimize inside each process because it's so much cheaper just to use another core or another node.

It's getting boring to hear all discussions of concurrency reduced to threads, and threads reduced to the GIL in CPython. It's really not that simple.

Yes, it's a trade-off, which is why I brought it up.

But my point here is that the statement the author made, as far as I'm able to understand it, makes no sense. That is, I think he tried to discuss these issues, but I don't think he understands them well enough to do so. I think you and I are in agreement, unless you are saying that what the author stated does make sense.