Hacker News new | ask | show | jobs
by asksol 3662 days ago
>Be careful using the multiprocessing module, it has some very serious bugs. I've spent the last 4 years rewriting parts of it for Celery

I regretted this as soon as I submitted it. I would hate for someone to do the same thing to my projects so I should know better. I've written about it before, but realize that you probably have not read it :)

I really like the multiprocessing library, it helped me start Celery in the first place. What it tries to solve is actually very very complicated, and you would have to test it on production systems for years to be sure it works, and I think Celery was the app that did that testing. I contributed some fixes back into Python, but most of it is not merged upstream.

The most complicated issue I had to solve was that multiprocessing.Pool uses POSIX semaphores to share pipes between processes (that's how the pool processes receive jobs, and the parent receive results). If a child process is killed before releasing that semaphore you have a deadlock that's tricky, if not impossible to solve. So I rewrote the pool to use async I/O instead, which also had the side effect of drastically improving performance (no locks). Sadly I'm not sure how to implement that on Windows, so it's unlikely to be merged upstream. Other fixes and features used by Celery is available in our billiard (on PyPI) fork of multiprocessing, but the async pool is not part of that yet as it currently depends on code in celery that does not fit in billiard (it should be rewritten to use asyncio now).

You can claim to replace Celery using a small layer on top of async I/O, or claim to replace Celery with a simple Redis list operation, but I think that's unfair to all the work that went into Celery, and the other features Celery implements like monitoring, workflows, and a large list of other things that you don't immediately think of when starting a project. It keeps a repository of these patterns for the Python community, and even something like crossbar.io could be supported as a transport.

1 comments

I would never claim that you can use "a small layer on top of asyncio to replace celery". I've read celery codebase, it's very, very thorough. Also in the future, I may even try to integrate celery in the asyncio event loop so I don't have to start a separate process.