Hacker News new | ask | show | jobs
by mortoray 3682 days ago
In my quest for a memory management scheme on Leaf it's your final point which is primarily forcing my decision: the compatibility with other memory managers. In this article I just wish to provide counterpoint to those people trying to convince me that tracing GC is a better approach. In the end, it's the need to right libraries and be compatible that prevents me from using a complex GC.
1 comments

Just keep in mind the trade-offs that you are making.

For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs.

Second, adding concurrency to the mix can complicate matters a lot (depending on your concurrency model [1]). You may need additional atomicity guarantees for the reference count updates – which, while cheaper than a full-blown lock, is generally still more expensive than the purely sequential case; you preclude compiler optimizations that can eliminate the overhead of pointer updates; and you're still lift with the potential for race conditions (e.g. when one thread writes to a global variable or heap location while another thread is retrieving a pointer from the same location, and that pointer is the only remaining reference to an object).

Third, typical approaches to cycle detection and collection (such as trial deletion) reintroduce most of the challenges of tracing GC. While you may think that you can avoid cycles or handle them manually, they often crop up in unexpected places. A common case is when a closure is stored in an object that is also captured in the closure's environment [2].

I don't mean to discourage you from your decision – I don't know what your ultimate goals are and naive RC may very well be the best set of trade-offs you can make – I just want to alert you to possible trade-offs that you are making.

[1] Some popular concurrency models (such as Erlang's, Dart's, or Nim's) avoid shared references, so these issues would not even crop up for such languages, of course.

[2] I have a hypothesis that naive RC as an approach to memory management is particularly common in languages that do not (or historically, did not) support closures, but that's a different story.

What's the difference been naive RC and non-naive?

Would the above languages (Erlang, Dart and Nim) count as naive implementations?

I was under the impression Nim, at least, had pretty efficient RC implementation that performed favourably compared with a Boehm GC when considering single threaded performance. Is this accurate?

By naive I refer to RC implementations that simply instrument pointer assignments with the necessary reference counting instructions. Non-naive RC implementations try to minimize the overhead associated with it; deferred reference counting or having the compiler optimize away unnecessary reference count updates are typical approaches.

Erlang and Dart have tracing GCs; I was referring to their concurrency model (which uses thread-local heaps), not their method of garbage collection. Nim uses either deferred reference counting or a simple mark-and-sweep collector, depending on a compile time switch, but also uses thread-local heaps. The main attraction of Nim's deferred RC collector is that it can keep pause times very low; I haven't actually benchmarked its amortized performance, but I expect it should be competitive with or better than the Boehm GC, especially for large heaps.

Thanks for your answer. "Garbage collection" seems to cover a huge range of actual implementation approaches and sometimes it's hard to see the pros and cons of each approach when they're all lumped under the same thing.

In my mind, 'stop the world' pauses are the biggest disadvantage of some GC approaches, so it's nice to hear that deferred RC ameliorates that to some extent.