In Ruby, as in NodeJS, the GIL pushes you to scale horizontally. The memory footprint of Hello World becomes a big problem, because the number of copies you run will be proportional to the number of cores you have, not the number of machines. You get no benefit from moving from an 8 core box to 16 or 20 cores.
I suspect if they do manage to pull off more concurrency in Ruby 3, that vertically scaling machines will make more sense. If 8 cores benefit from a shared footprint, instead of one core per process, then the budget looks more attractive.
So now might not be the right time to cherry-pick some of these features, but it may not be far off.
FWIW the GIL has been the GVL since YARV was merged in and it became based on a virtual machine rather than purely interpreted. I believe this was 2.0.
> because the number of copies you run will be proportional to the number of cores you have, not the number of machines
While this is true, Ruby is also very CoW optimized so while forks grow linerally in size (with count), usually the first fork is drastically smaller than the process it was forked from.
I work at Heroku and recommend perf settings to customers. 5 years ago people were mostly hitting memory limits. Now it's pretty common to see apps that are maxing out the CPU well before coming close to ram limits.
Especially when compared to javascript, Ruby is extremely memory efficient.
I agree with your larger statement but wanted to chime in and expand on those two points.
CRuby could still be much better at CoW. In theory, a forked process only needs a similar memory allocation to a pthread. In practice the runtime writes in a bunch of these inherited pages and fucks it up. malloc-ed memory is usually bigger than the "Ruby heap" so that kind of limits the impact you can have by trying to not write/re-write.
The high memory usage of ruby still causes problem if the app is single-threaded. I scaled databases for ruby apps for a living for almost 8 years, and sadly single-threaded legacy ruby app is still a thing.
Anyway, in the single-threaded scenario, the app may appear to be CPU bound under the steady state. However, when some hiccup happens in a database or in another microservice, all the ruby processes could soon be blocked waiting for network responses. In this case, ideally there should be plenty of idling ruby processes to absorb the load, but it will be rather costly to do so due to the high memory usage.
There are potential fixes of course, but with trade-offs:
- Aggressive timeout: May cause requests to fail under the steady state
- Circuit breaker: Difficult to tune the parameters, may not get triggered, or may prolong the degraded state longer than necessary. Also not a good fit when the process is single-threaded, as it can only get one data point at a time.
- Burning money: Can only do this until we hit the CPU : memory ratio limit imposed by the cloud vendors.
- Multi-threading: Too late to do this with years of monkey-patching that expects the app to run single-threaded.
CRuby forks using fork() and Copy-on-Write shares memory from parent to child.
JRuby doesn't have a GIL so you only need a single process. Same with TruffleRuby.
With CRuby, you're much better to run a bigger container with multiple processes than one process per container.
With either NodeJS or CRuby you're still better to run less containers on bigger hosts. Each host has to duplicate the host OS and container infrastructure. Each container of a real production app also duplicates a bunch of stuff despite Docker's best attempts at sharing.
Some major differences here are how they interface with I/O and the mechanisms around memory sharing.
Nodejs workers are more like webworkers and mostly suitable for proper CPU-intensive parallelization whereas in Ruby it's not uncommon to run e.g. multithreaded web server in the same process and namespace.
That's rather vague. But yes, no matter which JIT you always need some extra memory to run the JIT, and it creates a more optimized version while also needing the unoptimized version of the code, so it needs more memory.