| I don't think I can do all of those justice right this moment but let's get a few here so people can link to an HN thread instead of the red hat article ;) MP is good when the GIL would hamper your thread concurrency, ie your problem is likely CPU bound rather than IO/network bound (where Cython is good about releasing the GIL). Cost is that there's process overhead of each new instance of Python running the worker code and pickling any required data to/from the worker across a process boundary (rather than between threads). Benefit is mostly the last answer: can saturate all available CPUs. Few options but generally something like `concurrent.future`'s `.map` will keep tasks with order while a `.submit` and then checking with `.as_completed` will be tasks out of order (but if you return an ID of what you were working on you could reorder after and that may be worthwhile if the workloads are highly variable). Exceptions: Capture all in your worker and make available to the main via event or queue and check that signal periodically in your main and take action as needed. For the other (Ctrl+C in your main) have your workers periodically check a signal from main as often as needed for the responsiveness desired and have the worker cleanup/quit on Interrupt signals. Data transmission feels too problem-dependent to give a single answer to but if you're processing say, files, don't read and pass the files bytes to a worker, pass the file's location and let the worker read the file and return/write results. |
BTW, I haven't touched MP in Py for couple of years now, but remember there's some misalignment in concurrent futures between pool.map and pool.submit.
And for exceptions in background processes, I had a shared bool var called "emergency brake", and an error queue.