Hacker News new | ask | show | jobs
by ajakate 1916 days ago
Maybe only a little related, but in the five-ish years of ruby dev I've done, there was only one time I can remember interacting with the GC directly in production code.

It was in the context of a sidekiq job that was importing customer data via csv file. We would read in the csv, and for each row a lot of complicated logic was being performed that would translate the data from customer format into our format, and decide how to update different tables in db. These files were sometimes 10k lines or longer (all handled by a single sidekiq job), and would balloon up in memory so much that sidekiq would crash and would keep trying to restart the job. For each row we were instantiating an ActiveModel object that had a lot of attributes/functions. I think the right solution would have probably been to do a (fairly heavy) refactor of that area in the code, and spin up a separate job for each row, but we found that by running a GC.start every few rows we were able to cleanup some of the old AM objects and keep the memory usage low for the time being...

2 comments

Mirrors my experience as well. Nearly a decade of working on large Ruby/Rails apps, some with very complex reporting / data processing flows (talking like billions of db rows processed in streaming queries, media encoding, etc) and a particular CSV processing situation like yours was the only time I needed to manually trigger GC... and even that was just triggering it, not even tweaking it.

The defaults seem very good, even at scale.

Part of that is because Ruby has a very predictable, but slower, GC. Java on the other hand has multiple memory managers... some optimized for high throughput/spikes, but are much harder to predict.
That's interesting!

I've been doing Ruby since 2014. Mostly Rails, but also a bunch of data processing.

I have run into memory issues at times, when shuffling large amounts of data around. But manually running GC was never the answer in my cases.

In all cases, the memory issues were because I'd created a bunch of heavy objects that were still in-scope and were therefore not eligible to be cleaned up by GC anyway.

This was all Ruby 2.0+ and most of the heavy data processing stuff was 2.3+. So I wasn't doing any of it back in the days of really ancient Ruby GC.

I've done a lot of similar work and learned a lot of similar lessons. They were interesting and fun challenges but I've since moved on from Ruby in my professional life.

I'll say this much: When I was working on these applications, one of the minor wins that I had was swapping them over to the jemalloc memory allocator. It has introspection/instrumentation tooling that is really useful for these sorts of situations. You can use `MALLOC_CONF` [0] to trigger some built-in profiling. For instance, `export MALLOC_CONF='prof_leak:true,lg_prof_sample:0,prof_final:true'` will trigger jemalloc to log the heap at exit which is very useful for tracking down leaks.

[0]: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Leak-C...