Hacker News new | ask | show | jobs
by rjzzleep 912 days ago
The one thing I genuinely don't understand is why there is no single task queue that works across ruby and python. I get that at some point people just started making http based microservices to pass information around, but at the end of the day a simple task queue that has a unified storage format across both is a better way to connect ruby(rails) based with the ml stack. There are probably thousands of custom rabbitmq or redis based private company solutions out there.
6 comments

Mike Perham (the sidekiq maintainer) also maintains the less well known faktory[0] which is language agnostic and has runners for both Ruby and Python

[0] https://github.com/contribsys/faktory

That's awesome. Any idea why he doesn't just supersede Sidekiq with that? I spent quite some time hacking my own solution.

I just looked at the source, I guess Mike has been mostly working on both projects on his own for the last 4 years, so Faktory has a lot of features that require the enterprise license.

I wonder if he could change the situation it he markets a bit more to the python and more specifically Django community.

It is pretty cool. I’ve been using it for 3-4 years as a queue between Ruby and Node for scraping tasks. Ruby queues the work, a node worker does the downloading, and a Ruby process parses and loads data. It works incredibly well in my use case and has saved me from having to shift everything to one language or the other. It’s been very reliable.
Because people running sidekiq with their Ruby app on production don’t care about cross-language queue. If you pay for Pro or Enterprise you don’t want any major changes that are potentially breaking.
The only reason we pay is because the pro version doesn't lose jobs if a worker crashes. You would think that would be a core feature.
Sounds like Mike found a good feature that would encourage companies to purchase a license. I have a tremendous admiration for the business he's built.
You would think not losing jobs would be a required feature in any job queue software product.
A man’s gotta eat.
This relies on the Go runtime scheduler. Sometimes that is not good enough.
My customers are using Celery for Python and Sidekiq for Ruby. Those are parts of Django and Rails web apps. Those customers don't mix languages so they don't need workers able to run code in multiple languages. One of them is also using SQS though so we could receive a JSON in a server written in any language, do some processing and return the result. However the database of that app has been "destroyed by design" by using Django's ORM inheritance (my suggestion: never use it) so nothing can interact effectively with it except Python code using the same models.

By the way, celery was born as a protocol spec to support multiple languages but never moved past Python. I can't google a quote for that, I remember I saw it years ago in the documentation somewhere.

I can see that it was born that way, but nowadays most job queues say, don't touch the wire protocol, it's not intended to be used directly.

I still think Rails has the best ORM design I've ever seen, iterated with practical applications. Django's ORM and migrations are, for lack of a better word, odd.

I'm a bit surprised that people here argue that no one using Rails would ever want to interface with other languages. Most big companies do. How can you not interface with python these days.

Looks to me like celery might just be the only job queue left like that. Might be worth writing a current ruby en-queuing library for it. Retracting my previous statement about ActiveJob since it would probably be too much effort to execute anything bidirectionally.

https://docs.celeryq.dev/en/stable/internals/protocol.html#

What's wrong with Django's orm? Does it have something particularly bad compared to others?
I think the poster is specifically referring to using inheritance in Django ORM, where if you had e.g. a model Book and then a model Novel that inherits from it. In python these are modeled as a class inheritance hierarchy, and Django (at least, by default) creates a database table per class in the hierarchy. If you have 3-4 levels of inheritance, that's 3-4 extra joins per query.
This. That customer of mine started a project a few years before hiring me. They used inheritance and each model is scattered around a number of tables. No external tool can sensibly access that database, except that very Django app and its manage.py commands. Add a similarly enthusiastic use of apps under the same main directory and the database is a mess of long named tables with a tangle of relationships between them.

We started another project later on and we planned the database first. We wrote one model per table, no inheritance, only a few cleanly delimited apps. We still use makemigrations and migrate but if we want we can write a piece of software on any language to access that database.

We use that, a base model and some mixins, but we use Meta.abstract=True (or similar, not at my desk) on the parents and have not noticed any issues, though have not looked for them either! Should I be concerned?
That works fine. It's when you inherit from a non-abstract model that you end up with the trickier data model.
Oh, got it thanks
There's beanstalkd, it has a few Python libraries and it works out of the box with ActiveJob via Backburner.

https://beanstalkd.github.io/

Because it's a rare need and easy to write your own based on postgres or redis.

External systems come with a cost, especially if it's more than a library.

> The one thing I genuinely don't understand is why there is no single task queue that works across ruby and python.

Not sure I can recommend it, but Gearman does fit the bill. There’s client and worker libraries for a dozen languages, including Python3 and Ruby.

There is Perl Directory::Queue / Python dirq which also has implementations in Go, Java, and C.

A Ruby implementation would probably not be hard.

I don't how much people use this in serious or high performance work but it might be an option.