Hacker News new | ask | show | jobs
by pansa2 631 days ago
> What is the real world benefit we will get in return?

If you have many CPU cores and an embarrassingly parallel algorithm, multi-threaded Python can now approach the performance of a single-threaded compiled language.

3 comments

The question really is if one couldn't make multiprocess better instead of multithreaded. I did a ton of MPI work with python ten years ago already.

What's more I am now seeing in Julia that multithreading doesn't scale to larger core counts (like 128) due to the garbage collector. I had to revert to multithreaded again.

I assume you meant you had to revert to multiprocess?
Yes exactly. Thanks.
You could already easily parallelize with the multiprocessing module.

The real difference is the lower communication overhead between threads vs. processes thanks to a shared address space.

Easily is an overstatement. Multiprocessing is fraught with quirks.
Well I once had an analytics/statistics tool that regularly chewed through a couple GBs of CSV files. After enough features had been added it took almost 5 minutes per run which got really annoying.

It took me less than an hour to add multiprocessing to analyze each file in its own process and merge the results together at the end. The runtime dropped to a couple seconds on my 24 thread machine.

It really was much easier than expected. Rewriting it in C++ would have probably taken a week.

In F#, it would just be

    let results = files |> Array.Parralel.map processFile
Literally that easy.

Earlier this week, I used a ProcessPoolExecutor to run some things in their own process. I needed a bare minimum of synchronization, so I needed a queue. Well, multiprocessing has its own queue. But that queue is not joinable. So I chose the multiprocessing JoinableQueue. Well, it turns out that that queue can't be used across processes. For that, you need to get a queue from the launching process' manager. That Queue is the regular Python queue.

It is a gigantic mess. And yes, asyncio also has its own queue class. So in Python, you literally have a half a dozen or so queue classes that are all incompatible, have different interfaces, and have different limitations that are rarely documented.

That's just one highlight of the mess between threading, asyncio, and multiprocessing.

Well I'm not here to debate the API cleanliness, I just wanted to point out to OP that Python can utilize multicore processors without threads ;)

Here is the part of multiprocessing I used:

  with Pool() as p:
      results = p.map(calc_func, file_paths)
So, pretty easy too IMO.
Fraught with quirks sounds quite ominous. Quuuiiirkkksss.

I agree though.

That's not really correct. Python is by far the slowest mainstream language. It is embarrassingly slow. Further more, several mainstream compiled languages are already multicore compatible and have been for decades. So comparing against a single-threaded language or program doesn't make sense.

All this really means is that Python catches up on decades old language design.

However, it simply adds yet another design input. Python's threading, multiprocessing, and asyncio paradigms were all developed to get around the limitations of Python's performance issues and the lack of support for multicore. So my question is, how does this change affect the decision tree for selecting which paradigm(s) to use?

> Python's threading, multiprocessing, and asyncio paradigms were all developed to get around the limitations of Python's performance issues and the lack of support for multicore.

Threading is literally just Python's multithreading support, using standard OS threads, and async exists for the same reason it exists in a bunch of languages without even a GIL: OS threads have overhead, multiplexing IO-bound work over OS threads is useful.

Only multiprocessing can be construed as having been developed to get around the GIL.

No, asyncio's implementation exists because threading in Python has huge overhead for switching between threads and because threads don't use more than one core. So asyncio was introduced as a single threaded solution specifically for only network-based IO.

In any other language, async is implemented on top of the threading model, both because the threading model is more efficient than Python's and because it actually supports multiple cores.

Multiprocessing isn't needed in other languages because, again, their threading models support multiple cores.

So the three, relatively incompatible paradigms of asyncio, threading, and multiprocessing specifically in Python are indeed separate attempts to account for Python's poor design. Other languages do not have this embedded complexity.

> In any other language, async is implemented on top of the threading model

There are a lot of other languages. Javascript for example is a pretty popular language where async on a single threaded event loop has been the model since the beginning.

Async is useful even if you don't have an interpreter that introduces contention on a single "global interpreter lock." Just look at all the languages without this constraint that still work to implement async more naturally than just using callbacks.

Threads in Python are very useful even without removing the gil (performance critical sections have been written as extension modules for a long time, and often release the gil).

> are indeed separate attempts to account for Python's poor design

They all have tradeoffs. There are warts, but as designed it fits a particular use case very well.

Calling Python's design "poor" is hubris.

> So my question is, how does this change affect the decision tree for selecting which paradigm(s) to use?

The only effect I can see is that it reduces the chances that you'll reach for multiprocessing, unless you're using it with a process pool spread across multiple machines (so they can't share address space anyway)

> Calling Python's design "poor" is hubris.

Not in the least. Python is a poorly designed language by many accounts. Despite being the most popular language in the world, what language has it significantly influenced? None of note.

> Python is a poorly designed language by many accounts

Hubris isn't rare.

> what language has it significantly influenced?

I can think of at least 1 language designer[1] who doesn't think it's "poorly designed," based on it's significant impact on what they're currently working on[2]

1. https://en.m.wikipedia.org/wiki/Chris_Lattner 2. https://www.modular.com/mojo

Who cares about how many other languages a language has influenced? If that was a metric of any consideration we all would write Algol or something. Programming languages are tools, tools to help you perform a task.
>Python is by far the slowest mainstream language. It is embarrassingly slow.

Oh? It is by far the fastest language for me. No languages comes close on the time from starting to write, to have code that runs. For me that time far outweighs the execution time, so it is a lot more important.

may i ask in what field do you specialize? Because any modern language i can think of, is one "project init" command away from "nothing" to "running"