Hacker News new | ask | show | jobs
by kmeisthax 242 days ago
What you're describing is not just a problem with GC, but pointers in general. Optimizers would choke on exactly the same scheme.

What compiler writers realized is that pointers are actually not integers, even though we optimize them down to be integers. There's extra information in them we're forgetting to materialize in code, so-called "pointer provenance", that optimizers are implicitly using when they make certain obvious pointer optimizations. This would include the original block of memory or local variable you got the pointer from as well as the size of that data.

For normal pointer operations, including casting them to integers, this has no bearing on the meaning of the program. Pointers can lower to integers. But that doesn't mean constructing a new pointer from an integer alone is a sound operation. That is to say, in your example, recovering the integer portion of y and casting it to a pointer shouldn't be allowed.

There are two ways in which the casting of integers to pointers can be made a sound operation. The first would be to have the programmer provide a suitably valid pointer with the same or greater provenance as the one that provided the address. The other, which C/C++ went with for legacy reasons, is to say that pointers that are cast to integers become 'exposed' in such a way that casting the same integer back to a pointer successfully recovers the provenance.

If you're wondering, Rust supports both methods of sound int-to-pointer casts. The former is uninteresting for your example[0], but the latter would work. The way that 'exposed provenance' would lower to a GC system would be to have the GC keep a list of permanently rooted objects that have had their pointers cast to integers, and thus can never be collected by the system. Obviously, making pointer-to-integer casts leak every allocation they touch is a Very Bad Idea, but so is XORing pointers.

Ironically, if Alloy had done what other Rust GCs do - i.e. have a dedicated Collect trait - you could store x and x^y in a single newtype that transparently recovers y and tells the GC to traverse it. This is the sort of contrived scenario where insisting on API changes to provide a precise collector actually gets what a conservative collector would miss.

[0] If you're wondering what situations in which "cast from pointer and int to another pointer" would be necessary, consider how NaN-boxing or tagged pointers in JavaScript interpreters might be made sound.