Oddly enough the unused bits are in the middle of the address. They're also sign-extended rather than filled with zeros, so sometimes they are ones and other times they are zeros.
Hmm, it appears that the top byte on arm64 is only ignored if TBI (Top Byte Ignore) is enabled.
I don't think pointer signing requires TBI though. Pointer signing uses the PAC instruction to sign a pointer, and the AUT instruction to verify and unpack the signed pointer, but in its signed/packed form it is not a usable pointer. So actual addressable pointers need not support non-canonical addresses.
It's for a different purpose. (as in mitigate to some extent security bugs) And isn't an Apple feature only but an Arm one. (that is only rolling out on Cortex with Cortex-A78C and A78AE)
Yes generally for userspace addresses they are 0. But more importantly they can be used for other stuff, commonly referred to as pointer tagging / smuggling etc.
It's a useful optimisation technique where you can add some extra metadata without having to dereference a pointer.
The reason why amd64 checks whether the addresses are “canonical” is discourage exactly this trick. On almost all platforms that simply ignored upper byte of pointer (m68k, s390, IIRC even early ARMs) this lead to significant compatibility issues.
As for storing tags in pointers on 64b platforms it is probably better to use the 3 low order bits. Another useful trick is what was used in PDP-10 MacLisp and is used by BDW GC: encode the type information in virtual memory layout itself.
I guess it checks it when you actually try to dereference the pointer?
On Intel too you still have to "repair"the pointer before you use it.
It's definitely not the safest optimisation but it can be used to great effect when needed.
I think Intel is adding CPU support for pointer tagging operations in the future which should make them a lot easier / safer / more efficient to work with, though I can't find a reference now, it doesn't refer to it as pointer tagging.
Any more information on encoding the type information in virtual memory layout? Sounds cool.
I guess you have different types allocated in specific regions?
Most general purpose ISAs (eg. SPARC and IIRC RiscV has something similar) with some kind of intrinsic support for tagged pointers also prefer the tags in low order bits.
And you are right that the tag inside address trick involves allocating objects of same type in different continuous regions. Usually such that whole page contains object of same type (as far as the tagging scheme is concerned) and by either masking off lower ten-ish bits of pointer you get to type header or you have some global out-of-line map of page frame->type.