Hacker News new | ask | show | jobs
by dlikhten 5458 days ago
I fail to see how this is an all hands abandon ship issue. If its a critical issue in all 3 interpreters they should be fixed asap if possible. At worst with a flag.

If rubinius/ironruby/jruby have no issues, this may become moot eventually as rubinius is gaining lots of traction recently and is becoming faster by the release outperforming standard ruby vms in many cases.

2 comments

Neither Rubinius nor JRuby (and probably IronRuby too) have this issue because they all use accurate garbage collection rather than conservative. Accurate requires much more bookkeeping since all pointers must always be properly identified, but if you start writing a system with accurate GC, it's pretty easy. Bugs like this are a direct result of a conservative GC strategy (and these bugs, as I'm sure you got reading Joe's post, really really suck to find).
This class of subtle bugs exists whether or not your GC is accurate as soon as you take the red pill and leave the VM environment. If you forget to add your C pointer to the accurate GC's root set, you're just as dead. Related story: http://news.ycombinator.com/item?id=217189
But that is by definition a tractable problem because the source will show that the root set isn't being used properly. (additionally, in practice this proves to be a rare and easy to fix bug)
I think the author has a valid point that the "conservative" garbage collection approach has a flaw in its assumptions about the behavior of C compiler optimizations, and it doesn't sound like something easy to fix without a rewrite (i.e. switching to "accurate" GC). This sort of flaw will continue producing new surprising bugs, potentially any time the code is changed, or any time the compiler's optimizations change. These sorts of bugs are frustrating to track down, because they depend both on details of code optimization, and on details of memory allocation/deallocation history. If you compile with debugging options, you may change what optimizations are used; if you insert debug prints for some old-school log-based analysis, you may change the allocation/deallocation history, so the GC gets triggered in a different place.
Right now, doesn't the GC traverse the entire heap and keep all objects where the memory's value looks like it might possibly be a pointer to some other object in memory?

This certainly isn't an awesome solution but couldn't the GC backtrace(3) the current process and look at %eax at all C stack frames to additionally include that value in the "pointers currently plausibly in flight" list?

The problem is this[1]: strings are compound objects, which use 2 memory allocations. One for the object representation, the other for the memory holding the character array. The problem arises when you access the character array but technically no longer need the string object itself anymore. The C compiler notices that you don't use the pointer to the string object anymore, so it doesn't bother storing that on the stack. It is allowed to do this. The GC's mark phase now runs; it inspects all the stack frames and the global roots. It detects that no references to the string object exist and decides to collect it. There happens to be a destructor function associated with that memory object, which frees the character array, as the character array is manually memory managed. It blows up when you then try to access that character array directly.[2]

The correct way to handle this is to add the object reference to the GC's "root" set while you're using its guts, and removing it again when you're done.

Another possible solution is to allocate the string object and its character representation in one chunk of memory. This only works for immutable strings which never share substructure, though. The reason this works is that most conservative GCs will consider objects live as long as there is a pointer pointing to somewhere within a chunk of memory, not necessarily at the beginning.

[1] note: I'm not a Ruby coder but I fixed a very similar problem in a Lua implementation about 4 years ago. That one wasn't even conservative GC. EDIT: I told the story of that bug on HN 3 years (!) ago http://news.ycombinator.com/item?id=217189

[2] worse, it probably doesn't blow up immediately and instead causes memory corruption.