Hacker News new | ask | show | jobs
by emmelaich 1061 days ago
Why would they get stuck/dead and why wouldn't that happen with threads which might be even worse as they're more tightly bound? At least with zombies or inactive processes you can detect and kill them externally - if needs be.

Haven't played with multiprocess at scale, so am genuinely interested.

1 comments

If subprocesses die (segfault maybe) it isn't uncommon for them to not be cleaned up and/or cause the parent process to hang while it waits for the zombie to respond. That's one I experienced last week on Python 3.9. A thread that experienced that would likely kill the parent process or maybe even exit with a stacktrace. Way easier to debug, and doesn't require me to search through running tasks and manually kill them after each debug cycle.

My impression is that the multiprocessing module is a heroic effort, but unfortunately making the whole system work transparently across multiple OSs and architectures is a nearly insurmountable problem.

You may be interested in the concurrent.futures library, available for over a decade now. It keeps you from shooting yourself in the foot like that.

https://docs.python.org/3/library/concurrent.futures.html

Why do you think it would help?

It provides a nice interface but is using multiprocessing or multi threading under the hood depending on which executioner you use:

> The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.

Your trouble seems to involve not understanding how to set up signal handlers, which ProcessPoolExecutor handles for you and exposes via a BrokenProcessPool exception.
> Derived from BrokenExecutor (formerly RuntimeError), this exception class is raised when one of the workers of a ProcessPoolExecutor has terminated in a non-clean fashion (for example, if it was killed from the outside).

What if it hangs?

That isn’t the scenario originally described, but there is a timeout parameter in future.result().
Always setting a timeout on every IPC or network operation helps immensely. IIRC multiprocessing module allows that everywhere, but defaults to waiting forever in a couple of places.
Zombies don't respond, they merely have to be wait()'d for. Which should take microseconds at most.

I've seen orphaned processes sometimes idle, sometimes busy doing god knows what. But Zombies OTOH are rarely a problem, and should be able to be dealt with easily.

Perhaps the desire of Python to be Windows compatible mitigates against some design more suitable for Unix.