Hacker News new | ask | show | jobs
by ruggeri 5282 days ago
I enjoyed this basic introduction to GC; upvoted. What I'd look forward is further discussion of incremental and concurrent GC algorithms.

Until then, does anyone know what makes concurrent GC non-trivial? It seems like it shouldn't be too hard to trace concurrent to program execution. And if you don't compact, collection seems to just involve updating some structure tracking free blocks. I'd imagine it's possible to write a thread-safe version of that structure where every "free" request doesn't need to block every "malloc" request. But I must have missed something.

I'd also be interested to read how compaction works; how are references remapped from the old address to the new one? Is it possible that a reference value is a pointer to a reference "object" which contains the pointer to data, which needs to be updated? Then you only need to update a single pointer when moving data, but every dereferene incurs an extra layer of indirection.

1 comments

Concurrent GC is somewhat non-trival as the mutator (i.e. the program you write) can delete references to objects from an area of the heap that has not been examined by the GC and introduce references to those same objects in an area that has been examined. This means those objects will be reclaimed since the GC never saw them.

To cope with this the mutator is modified at compile/run time to inform the GC of object updates that could lead to this situation.

During compaction of any sort, the first time an object is encountered that is to be moved it is copied somewhere else and the old copy's header is overwritten with it's forwarding address. Every time the GC encounters a reference to the old object, it re-writes the reference to the old object with it's new location (conveniently located in the old copy's header).

> Is it possible that a reference value is a pointer to a reference "object" which contains the pointer to data, which needs to be updated?

Look up something called 'Brook's style forwarding pointer', it is essentially what you've described.

Thanks; upvoted! I think I'm beginning to appreciate that concurrent marking is tougher than I thought; specifically, I can see how it can be hard to prove an object is unreachable. So I imagine the marking side is where the difficulties are.

Your explanation for compaction makes perfect sense. Of course, this won't work trivially concurrently. Only if you stop the world can you complete examination of the entire live heap and know you've updated all references to the moved object and can collect the original space.

> I can see how it can be hard to prove an object is unreachable.

The difficulty with concurrent GC is not to prove an object is unreachable but to prove is likely to be reachable. For instance Yusa-type (or snapshot-at-the-beginning) concurrent collectors have a tendency keep dead objects alive. This 'floating garbage' is a small price to pay for a fairly simple correctness proof of never dropping a reachable object.

It's no harm for a GC to keep some dead objects alive but dropping something reachable is completely unacceptable.

> Of course, this won't work trivially concurrently. Only if you stop the world [..] can collect the original space.

Yes, moving objects concurrently requires some serious thinking to get right. Sun's G1 collector does this by recording the location of pointer updates that point into regions which are to be compacted. This pushes the pause times down to the length of time to move objects and update references to them.

If you're real serious about moving objects concurrently, see "A Study of Concurrent Real-Time Garbage Collectors", Pizlo et al, 2008.