Hacker News new | ask | show | jobs
by asksol 3666 days ago
>This is a very limited type of communication. Asyncio with stuff like crossbar.io allow pub/sub.

Celery also supports pub/sub, and other topologies.

>So can your celery workers. It happened to me many times.

With the major difference that your tasks can be redelivered to a different worker, and so will complete anyway.

>Well, the main point of the article is that celery is solving >the GIL. It's not, it's bypassing it

I was agreeing with you there, but I guess my reply was not clear on that. I just wanted to point out some inaccuracies in your reply.

>coroutines and multiprocessing

Be careful using the multiprocessing module, it has some very serious bugs. I've spent the last 4 years rewriting parts of it for Celery

2 comments

>Be careful using the multiprocessing module, it has some very serious bugs. I've spent the last 4 years rewriting parts of it for Celery

I regretted this as soon as I submitted it. I would hate for someone to do the same thing to my projects so I should know better. I've written about it before, but realize that you probably have not read it :)

I really like the multiprocessing library, it helped me start Celery in the first place. What it tries to solve is actually very very complicated, and you would have to test it on production systems for years to be sure it works, and I think Celery was the app that did that testing. I contributed some fixes back into Python, but most of it is not merged upstream.

The most complicated issue I had to solve was that multiprocessing.Pool uses POSIX semaphores to share pipes between processes (that's how the pool processes receive jobs, and the parent receive results). If a child process is killed before releasing that semaphore you have a deadlock that's tricky, if not impossible to solve. So I rewrote the pool to use async I/O instead, which also had the side effect of drastically improving performance (no locks). Sadly I'm not sure how to implement that on Windows, so it's unlikely to be merged upstream. Other fixes and features used by Celery is available in our billiard (on PyPI) fork of multiprocessing, but the async pool is not part of that yet as it currently depends on code in celery that does not fit in billiard (it should be rewritten to use asyncio now).

You can claim to replace Celery using a small layer on top of async I/O, or claim to replace Celery with a simple Redis list operation, but I think that's unfair to all the work that went into Celery, and the other features Celery implements like monitoring, workflows, and a large list of other things that you don't immediately think of when starting a project. It keeps a repository of these patterns for the Python community, and even something like crossbar.io could be supported as a transport.

I would never claim that you can use "a small layer on top of asyncio to replace celery". I've read celery codebase, it's very, very thorough. Also in the future, I may even try to integrate celery in the asyncio event loop so I don't have to start a separate process.
Hi asksol. I'll take this opportunity for publicly thanking your for all the help you give on IRC on your celery project. You saved my ass a lot of time. You rock.