Hacker News new | ask | show | jobs
by iam 5458 days ago
I think this is a problem that exists across any VM that implements a GC, not just Ruby.

.NET CLR has the exact same problem (perhaps a harder one, since CLR has a moving GC), so anytime they touch GC references (pointers to objects that are collectible) it's always wrapped in an explicit GC stack frame (think GC struct that lives on the stack). Furthermore, all reads/writes are carefully done with macros (which of course expands to volatile + some other stuff) to make sure the compiler doesn't optimize it away.

On the one hand, this is nice because they don't need to scan the C-stack (it scans the VM stack and the fake GC frame stacks -- well it's one stack but you skip the native C frames), on the other hand this means that any time a GC object is used in C code (ok, actually it's C++) they have to be real careful to guard it.

Of course bugs crop up all the time where an object gets collected where it shouldn't have, it happens so often that there is a name for it -- "GC Hole".

Astute readers and users of p/invoke may remark that they don't have to set up any "GC frames" -- that is because this complicated scheme is not exposed outside of the CLR source. Regular users of .NET who want to marshal pointers between native/managed can simply request that a GC reference gets pinned, at which point I'm mostly sure it won't get collected until it's unpinned.

The bad news is I'm almost positive there is nothing you can do with just C here to make this problem go away. You'd want stuff to magically just happen under the hood, and C++ is the right way to go for that.

It's probably possible to create an RAII style C++ GC smart pointer that would be 99% foolproof at the expense of some performance. It gets a little bit trickier if we are doing a moving collector. I am thinking it could ref/unref at creation/destruction, and disallow any direct raw pointer usage not to shoot yourself in the foot.

Of course the people writing the GC still need to worry about this..

3 comments

Anyone who has written an extension to a garbage-collected language in C will have run into this issue. Personally I've written extensions for Guile, OCaml, Ruby, MLton, and Java, and all of them have tricky rules for making your C code safe for garbage collection. Using volatile is the wrong way to do this though... this tells me that the people figuring this stuff out for Ruby don't really know C that well.
A very similar pattern bit me in the ass with the ObjC GC and libevent.
There are other ways to structure the VM's API so that all VM objects are connected to VM data structures at all times. A good example is Lua, where you manipulate Lua objects on the Lua stack - they are never referred to by a raw C pointer.