|
|
|
|
|
by tychver
3123 days ago
|
|
This is basically my job at ChartMogul and we've pretty much solved this problem. The two biggest issues for us were: Ruby prefers to grow the heap really quickly rather than spend much time in garbage collection. You can turn this growth factor down at runtime using an enviroment variable. The second problem is importing a huge chunk of rows at once means they have to exist in RAM at the same time. Use batched iterators to reduce peak memory usage. All GCed languages have this problem, Go included. You'd think Go's GC was somehow revolutionary from the way they talk about it, but it's basically the same as the Ruby GC, plus a little more parallelism. What helps Go is that the compiler can optimise and use stack allocation and re-use temprory variables. If it fails, it causes a nightmare, and the Go standard library is full of tricks to convince the compiler to do stack allocation. Java, OTOH, has compacting garbace collection, so after high peak memory usage, it can release the memory back to the OS. Aaron Patterson has been working on doing the same for Ruby. If you use JRuby, you'll get this right now, plus it's about 3x faster for shuffling data around. |
|