Hacker News new | ask | show | jobs
by ram_rar 2863 days ago
I love python. But its seriously, incapable for doing non trivial concurrent tasks. Multiprocessing module doesnt count. I hope the python core-devs take some inspiration from golang for developing the right abstractions for concurrency.
5 comments

Concurrent or parallel? For concurrency, python has asyncio, which many people consider a success.

For parallel execution, there's the GIL, but in practice it rarely matters, because once you want to do parallel execution, you have most likely a computationally intensive task to do, at which point you call down to C or something, and then GIL doesn't matter.

> most likely a computationally intensive task to do

Eh, let me stop you there. Everything isn't about performance.

Hardware and UI based things really benefit from parallelism.

I hope some of the (new?) concepts from trio find their way into the standard lib.

trio:

https://github.com/python-trio/trio

trio compared to asyncio, goroutines, etc.:

https://stackoverflow.com/a/49485603/1612318

"Notes on structured concurrency, or: Go statement considered harmful":

https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

As a developer working mostly with Python this comment makes no sense to me.

There are hundreds of libraries to deal with concurrency and/or parallelism in Python, asyncio, Celery and PySpark being the common ones.

All of them provide different approaches to concurrency because the language itself is not tight to one in particular.

These are all quite a lot harder to use than Go, and often they don't play well together. For example, there are lots of sync libraries (the Docker API, the AWS SDK, etc) that can't be turned async, so other folks have had to go through the trouble of forking and porting to async and since those other folks are often not affiliated with the original dev teams, who knows what the quality level of those libraries may be? We've also had a lot of problems with asyncio alone--often developers forgetting to await an async call or doing something (I'm not sure what exactly) that causes processes to hang indefinitely. It's all quite a lot more complex than Go's concurrency model.

And all of that is really just I/O parallelization; there's also CPU parallelization, and I don't believe Python has anything that's quite as easy as "Do these two things in parallel". Pretty much everything requires a lot of marshalling and process management which can easily slow a program down instead of improving it.

Python is great for a lot of things, and the community has found many creative workarounds for its shortcomings, but Go beats Python in I/O and CPU parallelism handily.

While I agree that python isn't ideal beyond a certain scope, I think you're overstating how bad it is. My team and I have built a number of non-trivial machine learning products with pipelines that use both the ThreadPool and ProcessPool components successfully. The headaches we have are related more to the fact that Python is dynamic than its concurrency story.
OP probably is overstating a bit, but it is hard to efficiently parallelize computation in Python. For example, if you have a large Python object graph that you need to compute over, you can't easily parallize the computation without paying some significant serialization cost. You can probably alleviate that by carefully choosing algorithms that minimize the amount of serialization per worker process, but at the end of the day, all of this is still quite a lot harder than using shared memory and goroutines. And not to mention Go is 1-2 orders of magnitude faster than Python in single-threaded execution... Python is great for lots of things, but efficient parallel programming in Python is _hard_, even if there are a handful of cases where it's not so hard.
I successfully do concurrent+parallel computing with Python using asyncio combined with ProcessPoolExecutor. I can see why perhaps that doesn't scratch your itch, but it sure scratches my web-crawling itch.