Hacker News new | ask | show | jobs
by tychver 3123 days ago
This is basically my job at ChartMogul and we've pretty much solved this problem. The two biggest issues for us were: Ruby prefers to grow the heap really quickly rather than spend much time in garbage collection. You can turn this growth factor down at runtime using an enviroment variable.

The second problem is importing a huge chunk of rows at once means they have to exist in RAM at the same time. Use batched iterators to reduce peak memory usage. All GCed languages have this problem, Go included.

You'd think Go's GC was somehow revolutionary from the way they talk about it, but it's basically the same as the Ruby GC, plus a little more parallelism. What helps Go is that the compiler can optimise and use stack allocation and re-use temprory variables. If it fails, it causes a nightmare, and the Go standard library is full of tricks to convince the compiler to do stack allocation.

Java, OTOH, has compacting garbace collection, so after high peak memory usage, it can release the memory back to the OS. Aaron Patterson has been working on doing the same for Ruby. If you use JRuby, you'll get this right now, plus it's about 3x faster for shuffling data around.

1 comments

Another difference (which, as I recall, mattered for us) is that class objects in Ruby take up quite a bit of permanent space in memory.
Ruby is actually pretty good in this regard. If you define your module or class anonymously, but give it a name using a constant, Ruby will GC it when possible. The standard way of defining modules and classes obviously means they can never be GCed.

Java doesn't, or at least didn't, collect anonymous classes without an additional GC flag being enabled, which could bite you quite hard using JRuby with gems which made heavy use of anonymous classes.

In practice, modules and classes are not defined that way - likely not in your own application, and certainly not in all the gems you depend on, or in the Rails framework itself. That entire set of transitive dependencies can take up a lot of memory.
The class definitions will be a tiny fraction of memory usage. A template Rails app memory usage only has about 20% managed by Ruby. The rest is the VM, C libraries, maybe long strings etc. Definitely not class definitions pulled in from gems.