| Just keep in mind the trade-offs that you are making. For starters, you are adding a considerable performance overhead to local pointer assignments; you're replacing a register-register move (and one that can be optimized away in many situations by a modern compiler through register renaming, turning it into a no-op) with a register-register move plus two memory updates plus a branch. Even where you have cache hits and the branch can be predicted correctly almost always, you're dealing with cache and BTB pollution; these effects may not fully show up in micro-benchmarks, but only completely manifest in large programs. Second, adding concurrency to the mix can complicate matters a lot (depending on your concurrency model [1]). You may need additional atomicity guarantees for the reference count updates – which, while cheaper than a full-blown lock, is generally still more expensive than the purely sequential case; you preclude compiler optimizations that can eliminate the overhead of pointer updates; and you're still lift with the potential for race conditions (e.g. when one thread writes to a global variable or heap location while another thread is retrieving a pointer from the same location, and that pointer is the only remaining reference to an object). Third, typical approaches to cycle detection and collection (such as trial deletion) reintroduce most of the challenges of tracing GC. While you may think that you can avoid cycles or handle them manually, they often crop up in unexpected places. A common case is when a closure is stored in an object that is also captured in the closure's environment [2]. I don't mean to discourage you from your decision – I don't know what your ultimate goals are and naive RC may very well be the best set of trade-offs you can make – I just want to alert you to possible trade-offs that you are making. [1] Some popular concurrency models (such as Erlang's, Dart's, or Nim's) avoid shared references, so these issues would not even crop up for such languages, of course. [2] I have a hypothesis that naive RC as an approach to memory management is particularly common in languages that do not (or historically, did not) support closures, but that's a different story. |
Would the above languages (Erlang, Dart and Nim) count as naive implementations?
I was under the impression Nim, at least, had pretty efficient RC implementation that performed favourably compared with a Boehm GC when considering single threaded performance. Is this accurate?