Hacker News new | ask | show | jobs
by ledil 3943 days ago
What do you consider as alternative to celery?
3 comments

It depends on what do you use Celery for.

For launching a bunch of IO-bound "tasks", for example calling external services from Django views, I'd consider using Twisted (or Tornado, or asyncio). Your tasks would need to be either written in async style, or you'd need to spawn new processes from within Twisted (but built-in functionality makes this rather easy). Still, Twisted is rock-solid, doesn't leak and is capable of handling a lot of (IO-bound) tasks concurrently.

If your tasks are CPU bound you pretty much have no choice other than something based on multiprocessing. You can still use Twisted, but only in the second way. If the code of your tasks doesn't use C extensions you could use Jython with threads. This way you'd get parallelism without having to rewrite much code.

If you need your tasks parallelized and you want to run a lot of them concurrently then I'm afraid you're out of options in Pythonland. Personally I'd go for Erlang with ErlPort, but I know Erlang rather well.

On the other hand, Celery is a nice piece of code. I think in most cases you don't need anything else, or at least nothing drastically different, like the options above. Perhaps rq would be a good idea. I also encountered an interesting project called Pulsar (http://pythonhosted.org/pulsar/overview.html), but it seems to be usable only on 3.3 and above.

Under some circumstances, if you're running periodic jobs, APScheduler works well in a pinch.
If you find one, let me know ;) I'm looking for something myself currently.

There's RQ (http://python-rq.org/) but it seems to have a similar design as Celery (just a simpler architecture) so it probably suffers from the same problem.

A good solution would be to have a series of workers that can launch new independent Python processes for each task, e.g. using the subprocess module.

I have long running jobs (say, 5 minutes on average - up to an hour). I originally used celery (after picloud shut down) but it just doesn't work well with those charcteristics. Each worker reserved an extra job so it was impossible to get good cpu utilisation.

I switched to rq and it's all been much easier. The behaviour is easy to understand and it's easy to inspect redis to see what's going on.

In terms of the code restart angle - I'm fairly sure you can effectively restart the workers. They run as a single process that forks to do work. Each copy you run only has a single worker, so you need to run multiple instances yourself. If you kill the parent it waits until the child has finished the job it is on before terminating.

I could be wrong about some of the details. I'd recommend giving it a shot. I must have run 100,000s of jobs through it now and I haven't had a single issue.

http://python-rq.org/docs/workers/

Thanks, that sounds interesting!