Hacker News new | ask | show | jobs
by gwbas1c 254 days ago
> Having acknowledged that pointers can be 'disguised' as integers, it is then inevitable that Alloy must be a conservative GC

C# / dotnet don't have this issue. The few times I've needed a raw pointer to an object, first I had to pin it, and then I had to make sure that I kept a live reference to the object while native code had its pointer. This is "easier done than said" because most of the time it's passing strings to native APIs, where the memory isn't retained outside of the function call, and there is always a live reference to the string on the stack.

That being said, because GC (in this implementation) is opt-in, I probably wouldn't mix GC and pointers. It's probably easier to drop the requirement to get a pointer to a GC<T> instead of trying to work around such a narrow use case.

4 comments

Also, Rust is not going to have it for the long run that pointers can be, in fact, disguised as integers. There is this thing called pointer provenance, and some day, all pointers are required to have provenance (i.e. a proof where they did come from) OR they are required to admit that POOF this is a pointer out of thin air, you can't assume anything about the pointee. As long as there are no POOF magicians, the GC can assume that it knows every reference!
> As long as there are no POOF magicians, the GC can assume that it knows every reference!

creating pointers without provenance is safe, so the GC can’t assume that a program won’t have them also be sound. This always be an issue.

I wouldn't say always: https://doc.rust-lang.org/std/ptr/index.html#strict-provenan...

I don't know what the plan is but I wouldn't be surprised if there's a breaking change (maybe in an edition) to remove exposed provenance from Rust entirely.

Even their so-called conservative assumption is also insufficient.

> if a machine word's integer value, when considered as a pointer, falls within a GCed block of memory, then that block itself is considered reachable (and is transitively scanned). Since a conservative GC cannot know if a word is really a pointer, or is a random sequence of bits that happens to be the same as a valid pointer, this over-approximates the live set

Suppose I allocate two blocks of memory, convert their pointers to integers, then store the values `x` and `x^y`. At this point, no machine word points to the second allocation, and so the GC would consider the second allocation to be unreachable. However, the value `y` could be computed as `x ^ (x^y)`, converted back to a pointer, and accessed. Therefore, their reachability analysis would under-approximate the live set.

If pointers and integers can be freely converted to each other, then the GC would need to consider not just the integers that currently exist, but also every integer that could be produced from the integers that currently exist.

> If pointers and integers can be freely converted to each other

You can only freely convert integers to pointers with "exposed provenance" in Rust which is currently unstable.

https://doc.rust-lang.org/std/ptr/index.html#exposed-provena...

I find the idea of provenance a bit abstract so it's a lot easier to think about a concrete pointer system that has "real" provenance: CHERI. In CHERI all pointers are capabilities with a "valid" tag bit (it's out-of-band so you can't just set it to 1 arbitrarily). As soon as you start doing raw bit manipulation of the address the tag is cleared and then it can be no longer used as a pointer. So this problem doesn't exist on CHERI.

Also the problem of mistaking integers as pointers when scanning doesn't exist either - you can instead just search for memory where the tag bit is set.

What you're describing is not just a problem with GC, but pointers in general. Optimizers would choke on exactly the same scheme.

What compiler writers realized is that pointers are actually not integers, even though we optimize them down to be integers. There's extra information in them we're forgetting to materialize in code, so-called "pointer provenance", that optimizers are implicitly using when they make certain obvious pointer optimizations. This would include the original block of memory or local variable you got the pointer from as well as the size of that data.

For normal pointer operations, including casting them to integers, this has no bearing on the meaning of the program. Pointers can lower to integers. But that doesn't mean constructing a new pointer from an integer alone is a sound operation. That is to say, in your example, recovering the integer portion of y and casting it to a pointer shouldn't be allowed.

There are two ways in which the casting of integers to pointers can be made a sound operation. The first would be to have the programmer provide a suitably valid pointer with the same or greater provenance as the one that provided the address. The other, which C/C++ went with for legacy reasons, is to say that pointers that are cast to integers become 'exposed' in such a way that casting the same integer back to a pointer successfully recovers the provenance.

If you're wondering, Rust supports both methods of sound int-to-pointer casts. The former is uninteresting for your example[0], but the latter would work. The way that 'exposed provenance' would lower to a GC system would be to have the GC keep a list of permanently rooted objects that have had their pointers cast to integers, and thus can never be collected by the system. Obviously, making pointer-to-integer casts leak every allocation they touch is a Very Bad Idea, but so is XORing pointers.

Ironically, if Alloy had done what other Rust GCs do - i.e. have a dedicated Collect trait - you could store x and x^y in a single newtype that transparently recovers y and tells the GC to traverse it. This is the sort of contrived scenario where insisting on API changes to provide a precise collector actually gets what a conservative collector would miss.

[0] If you're wondering what situations in which "cast from pointer and int to another pointer" would be necessary, consider how NaN-boxing or tagged pointers in JavaScript interpreters might be made sound.

A Gc<T> that can't give you a pointer inside seems almost unusable in the context of Rust. Pointers are not a narrow use case; references are pointers.

Rust APIs are largely built around references. If you were to put a Vec<T> (dynamic array) into a pointerless Gc<T>, you would be almost entirely unable to access its contents. The only way to access it would be swap it with an empty Vec, access it, then swap it back a-la Cell. You wouldn't even be able to clone the Vec without storing a dummy version in its place during the call.

https://doc.rust-lang.org/stable/std/cell/struct.Cell.html#m...

You miss the point: I'm referring to cases where you pass pointers from one language into another. In that case, because GC is opt-in, it's the wrong approach for managing whatever you're passing into the non-Rust language.

In your case, do you need to get a pointer to a GC<T> and use it within Rust? I haven't worked with Rust at that level yet, so perhaps I'm ignorant of a more common use case.

Worse, conservatism in a GC further implies it can't be a moving GC, which means you can't compact, use bump pointer allocation, and so on. It keeps you permanently behind the frontier.

I remain bitterly disappointed that so much of the industry is so ignorant of the advances of the past 20 years. It's like it's 1950 and people are still debating whether their cloth and wood airplanes should be biplanes or triplanes.

The thing I don't understand is why anyone would pass a pointer to a GC'ed object into a 3rd party library (that's in a different language) and expect the GC to track the pointer there?

Passing memory into code that uses a different memory manager is always a case where automatic memory management shouldn't be used. IE, when I'm using a 3rd party library in a different language, I don't expect it to know enough about my language's memory model to be able to effectively clean up pointers that I pass to it.

You can pass a pointer to a foreign library, but this requires temporarily making the pointee object a GC root because that library code is essentially sharing ownership of it with the GC.
> The thing I don't understand is why anyone would pass a pointer to a GC'ed object into a 3rd party library

The promise of GC is to free the programmer from the burden of memory management. If I can't give (perhaps fractional) ownership of a data structure to a library and expect its memory to be reclaimed at the appropriate time, have I freed myself from the burden of memory management?

Think about it this way:

Unless you are using malloc; and/or you don't need to do anything when the pointer is freed, (the pointer doesn't reference anything else that needs to be freed or released,) there's no way that the library written outside of your runtime knows how to free your memory.

Or to put it in a different way: Passing pointers to a native library is a small amount of what your application does and you still benefit from the garbage collector when you are running inside of your own language.

Yea, but, this is Rust. How is a moving GC supposed to handle an untagged union? Or a person who uses the now-stable provenance api to read/write pointer bits to/from disk.