Hacker News new | ask | show | jobs
by cbsmith 2830 days ago
> With Python, in practice most systems end up based on Celery, with huge long-running tasks.

Oh dear... Yeah, that's a terrible distributed system. Interestingly, all the distributed systems I've worked on with Python haven't had Celery as any kind of core component. It's just poorly suited for the job, as it is more of a task queue. A task queue is really not a good spine for a distributed system.

There are a lot of python distributed systems built around in memory stores, like pycos or dask, or built around existing distributed systems like Akka, Casandra, even Redis.

2 comments

> There are a lot of python distributed systems built around in memory stores, like pycos or dask, or built around existing distributed systems like Akka, Casandra, even Redis.

Cassandra and Redis just mean that you have a database-backed application. How do you schedule Python jobs? Either you build your own scheduler that pulls things out of the database, or you use an existing scheduler. I once worked on a Python system that scheduled tasks using Celery, used Redis for some synchronization flags, and Cassandra for the shared store (also for the main database). Building a custom scheduler for that system would have been a waste of time.

> Cassandra and Redis just mean that you have a database-backed application.

Oh there's a lot more to it than that. CRDT's... for example.

Well, Celery uses Rabbit-MQ and typically Redis underneath. Rabbit-MQ to pass messages, and Redis to store results.

You can scale up web servers to handle more requests, which then uses Celery to offload jobs to different clusters.

Yeah, but fundamentally Celery is a task queue. You don't build a distributed system around that.