| To be clear, Popen is very different from all the other options. That's for running other programs. Process is low-level and is almost never what you want. Pool is "mid-level", and usually isn't what you want. ProcessPoolExecutor is usually what you want, it is the "one obvious way to do it". That's not at all clear from the docs though. The one obvious way to do it, in general, is: subprocess.run for running external processes, subprocess.Popen for async interaction with external processes, and concurrent.futures.ProcessPoolExecutor for Python multiprocessing. Your other complaints about actually using the multiprocessing stuff are 100% valid. Error handling, cancellation, etc. is all very difficult. Passing data back and forth between the main process and subprocesses is not trivial. But I do want to emphasize that there is a somewhat-well-defined gradient of lower- and higher-level tools in the standard library, and your "obvious way to do it" should usually start at the higher end of that gradient. You might also want to look into the third-party Joblib library, which makes process parallelism a lot less painful for the straightforward use case of "run a function on a large amount of data, using multiple OS processes." |
Imagining I'm a newbie to Python concurrency, I Googled "concurrency in Python" and picked the first result from the official docs. https://docs.python.org/3/library/concurrency.html It's a list of everything except asyncio, and the first item on the list is the low-level `threading` :S At least that page mentions ThreadPoolExecutor, queue, and asyncio as alternatives, but I'm still lost on what is the correct way.