Hacker News new | ask | show | jobs
by ksec 904 days ago
I think Ruby 3.3 is perhaps one of the most important and feature rich Ruby release in the past 10 years. I never thought Ruby would have a shipping and production ready JIT before Python. And Prism, Lrama, IRB. A lot of these were discussed in previous HN submissions.

But one thing that is not mentioned or discussed enough, is Ractor, M:N thread scheduler, Fibre and Async. Especially in the context of Rails. I am wondering if any one are using these features in productions and if you could share any thoughts on the subject.

3 comments

> I never thought Ruby would have a shipping and production ready JIT before Python.

This is entirely predictable - Ruby does not have a big scientific computing community which happened to depend on every implementation detail of the hosting interpreter.

Python has a culture that sees writing C libraries as "Python" code, hence why.

It is quite common to see "Python" libraries that are just thin bindings layers, they could just as well be "Tcl" libraries for that matter.

I should start doing numerical work in TCL and see how long it takes for me to get set to the mad house
Well, first step is to create bindings to the same libraries Python uses.
The problem is that Numpy is not in fact anything close to a thin wrapper around BLAS/LAPACK like people seem to think it is.

First of all, it contains a ton of custom C code, which to some extent could be extracted to a separate library in theory, but isn't. Second, a lot of that custom code interacts deeply with the Python C API, which historically was very open-ended. Even getting it to work on another implementation of Python was a challenge that took a long time to reach baseline usability.

You could forego Numpy and call out to a library like Eigen, but even then you have a huge amount of work ahead to achieve anything resembling feature parity.

Who singled out Numpy?

Still, your lengthy explanation only confirms how much C and how little Python, that specific case happens to be.

I don't see how it's relevant given that YJIT didn't cause any compatibility issue whatsoever.
There's a talk from a couple of years ago by one of the YJIT developers that discusses this in some detail and it's more interesting/complicated than that. The whole thing is worth checking out but the specific section starts here

https://youtu.be/vucLAqv7qpc?t=937

There are huge apps (such as Shopify) which will safe a lot of money by having more performant BE so they do invest heavily into it.

Python workloads, with deep pocketed backers, do spend more time inside GPU or C runtime.

> I think Ruby 3.3 is perhaps one of the most important and feature rich Ruby release in the past 10 years.

Really? What’s so significant with this release?

> But one thing that is not mentioned or discussed enough, is Ractor, M:N thread scheduler, Fibre and Async.

Yes! Ractors deserve more highlighting! It’s a huge feature.

> Really? What’s so significant with this release?

I think the Prism parser update is a standout highlight for me. This is the start of many new static analysis tools for ruby.

It's also significant that RBS type information is starting to be used in IRB autocompletion. Previously RBS has been an interesting experiment but hasn't had much practical use compared to Sorbet.

Ruby seems to now have good answers to non-blocking IO (async fibers) and tooling questions (ruby-lsp). We're starting to see YJIT performance improvements starting to compound with more to come too.

That all seems significant to me. Thanks to everyone involved.

The one thing I genuinely don't understand is why there is no single task queue that works across ruby and python. I get that at some point people just started making http based microservices to pass information around, but at the end of the day a simple task queue that has a unified storage format across both is a better way to connect ruby(rails) based with the ml stack. There are probably thousands of custom rabbitmq or redis based private company solutions out there.
Mike Perham (the sidekiq maintainer) also maintains the less well known faktory[0] which is language agnostic and has runners for both Ruby and Python

[0] https://github.com/contribsys/faktory

That's awesome. Any idea why he doesn't just supersede Sidekiq with that? I spent quite some time hacking my own solution.

I just looked at the source, I guess Mike has been mostly working on both projects on his own for the last 4 years, so Faktory has a lot of features that require the enterprise license.

I wonder if he could change the situation it he markets a bit more to the python and more specifically Django community.

It is pretty cool. I’ve been using it for 3-4 years as a queue between Ruby and Node for scraping tasks. Ruby queues the work, a node worker does the downloading, and a Ruby process parses and loads data. It works incredibly well in my use case and has saved me from having to shift everything to one language or the other. It’s been very reliable.
Because people running sidekiq with their Ruby app on production don’t care about cross-language queue. If you pay for Pro or Enterprise you don’t want any major changes that are potentially breaking.
The only reason we pay is because the pro version doesn't lose jobs if a worker crashes. You would think that would be a core feature.
Sounds like Mike found a good feature that would encourage companies to purchase a license. I have a tremendous admiration for the business he's built.
A man’s gotta eat.
This relies on the Go runtime scheduler. Sometimes that is not good enough.
My customers are using Celery for Python and Sidekiq for Ruby. Those are parts of Django and Rails web apps. Those customers don't mix languages so they don't need workers able to run code in multiple languages. One of them is also using SQS though so we could receive a JSON in a server written in any language, do some processing and return the result. However the database of that app has been "destroyed by design" by using Django's ORM inheritance (my suggestion: never use it) so nothing can interact effectively with it except Python code using the same models.

By the way, celery was born as a protocol spec to support multiple languages but never moved past Python. I can't google a quote for that, I remember I saw it years ago in the documentation somewhere.

I can see that it was born that way, but nowadays most job queues say, don't touch the wire protocol, it's not intended to be used directly.

I still think Rails has the best ORM design I've ever seen, iterated with practical applications. Django's ORM and migrations are, for lack of a better word, odd.

I'm a bit surprised that people here argue that no one using Rails would ever want to interface with other languages. Most big companies do. How can you not interface with python these days.

Looks to me like celery might just be the only job queue left like that. Might be worth writing a current ruby en-queuing library for it. Retracting my previous statement about ActiveJob since it would probably be too much effort to execute anything bidirectionally.

https://docs.celeryq.dev/en/stable/internals/protocol.html#

What's wrong with Django's orm? Does it have something particularly bad compared to others?
I think the poster is specifically referring to using inheritance in Django ORM, where if you had e.g. a model Book and then a model Novel that inherits from it. In python these are modeled as a class inheritance hierarchy, and Django (at least, by default) creates a database table per class in the hierarchy. If you have 3-4 levels of inheritance, that's 3-4 extra joins per query.
This. That customer of mine started a project a few years before hiring me. They used inheritance and each model is scattered around a number of tables. No external tool can sensibly access that database, except that very Django app and its manage.py commands. Add a similarly enthusiastic use of apps under the same main directory and the database is a mess of long named tables with a tangle of relationships between them.

We started another project later on and we planned the database first. We wrote one model per table, no inheritance, only a few cleanly delimited apps. We still use makemigrations and migrate but if we want we can write a piece of software on any language to access that database.

We use that, a base model and some mixins, but we use Meta.abstract=True (or similar, not at my desk) on the parents and have not noticed any issues, though have not looked for them either! Should I be concerned?
That works fine. It's when you inherit from a non-abstract model that you end up with the trickier data model.
Oh, got it thanks
There's beanstalkd, it has a few Python libraries and it works out of the box with ActiveJob via Backburner.

https://beanstalkd.github.io/

Because it's a rare need and easy to write your own based on postgres or redis.

External systems come with a cost, especially if it's more than a library.

> The one thing I genuinely don't understand is why there is no single task queue that works across ruby and python.

Not sure I can recommend it, but Gearman does fit the bill. There’s client and worker libraries for a dozen languages, including Python3 and Ruby.

There is Perl Directory::Queue / Python dirq which also has implementations in Go, Java, and C.

A Ruby implementation would probably not be hard.

I don't how much people use this in serious or high performance work but it might be an option.