Hacker News new | ask | show | jobs
by kazinator 3662 days ago
C gets out of the way and lets you do useful things that are "undefined behavior". How convenient is it it Rust, to, say, use the unused bits in a pointer (due to alignment) and put a type tag in them?
3 comments

Like in C you can cast the pointer to an integer and back. Rust allows such hacks if you mark them with a "hold my beer" keyword:

    let the_bits:usize = unsafe { std::mem::transmute(pointer) };
You can also use `std::mem::forget(*pointer)` to avoid fighting with Rust about who manages the memory.
You can actually turn a pointer into an usize without using an unsafe block. For example, here's how you'd do it with a reference:

    let the_bits = pointer as *const _ as usize;
The ISA is generally pretty well-defined, a lot of the undefined behavior is introduced by C.

So I don't think it's fair to say that C "gets out of the way". It won't let you get the overflow flag, or alias arbitrary pointers, for instance.

Okay, now you've piqued my curiosity. Is that something to do because it's really smart and clever and fun, or is there a certain problem or class of problems where doing that is unambiguously the best or least-worst solution?
Haskell does it automatically as an optimisation: if an algebraic type has fewer than 2-3 cases, then it inline the tag bits directly into the pointer, thus saving an indirection on pattern match.

Some C data structures also make use of low level bit tricks like this to save space and reduce indirections. For instance, the hash-array mapped Trie uses a 32 bit mask to both track which indices of the current node are actually populated, and incidentally, how large the node currently is. It's quite clever.

These are always my foot examples to evaluate any alleged systems programming language. No language less powerful than a theorem prover is currently capable of expressing these idioms safely.

Rust actually does a bit of this internally in the form of the null pointer optimization. If you have an Option<&T> (or Option<*T>) value, it's actually stored as a single pointer-sized value, with None being represented by a null pointer on the assumption that null is not a valid pointer.
As to3m says, this is often done in programming language interpreters.

If most objects in your language were heap allocated and you want to store a small integer you would allocate a new object on the heap with space for one integer, set up its headers, etc. You could instead set one of the unused bits in the pointer to indicate that it's an integer and not a pointer, then store the integer in the remaining bits, avoiding the heap allocation at all. The Lisp world calls this a fixnum.

The more pointer bits you can steal, the more kinds of data you can store directly in the "pointer" itself. It's also possible to store the type of objects that are actually allocated on the heap in tags on the pointers to them, but I don't know if that's done any more.

You might do it if you were writing an implementation of a language such as Scheme.