| In terms of category theory, what we would need is subtraction and division types on top of product and sum types. So a 32-bit integer is the product of 32 two-state bit types. Something akin to NonZero could be defined as that type minus one state, such that there are now 4294967296 - 1 representable values. Similarly, pointer types on some machines always have some bits set to 0 due to hardware constraints. These can be represented in the type system as 2^64 / 2^n where 'n' is the number of bits that are not usable, resulting in something like 2^46 for typical CPUs. This would allow extra bits of state to be "packed in there". This concept is more generally useful, not just for bit-packing. For example, a database table row might be represented by a struct that contains a field for every column. But then, how to represent the result of a SELECT query!? Either you code-gen this into a new type, create that type manually, or use compiler-generated "unspeakable" type names. (Linq in C# does this.) Or... the language could natively support type division, so you could use one struct type for "all columns" and then use division on the type to delete columns you don't need for a particular query or REST response. There's a whole ecosystem of tools that work around the problem of not having this as a built-in feature in typical languages: AutoMapper, automap, Mapster, etc... |
I can see a couple of problem with this approach:
- you need to be able to talk about bit-level types, which contradicts the common assumption that all types are addressable, i.e. whose size and alignment is some positive integer (or zero) number of bytes;
- what if you want to substract more complex set of values? For example describing a 32 bit number without all multiples of 7? And how do you encode this in the compiler in such a way that type checking remains decidable?
- how do you safely and concisely express conversions and operations on these kind of types?
> Similarly, pointer types on some machines always have some bits set to 0 due to hardware constraints. These can be represented in the type system as 2^64 / 2^n where 'n' is the number of bits that are not usable, resulting in something like 2^46 for typical CPUs. This would allow extra bits of state to be "packed in there".
Note that this is not forward compatible, since newer CPUs can start using more bits. In fact recent CPUs started using 57 bits instead of 48 bits, so a program that packed more state in those top bits would now be broken. You should generally instead try to pack data in the least significant bits that will always be 0 due to alignment constraints.
Moreover the top bits are not always 0, they are equal to the most relevant bit of the used part of the address. On Linux this just happens to always be 0 for userspace addresses (and 1 for kernel addresses) but this won't be the case on all architectures and OSes.
I also wonder how you would define these types using subtraction/division types such that they are different?
- the address type being 64 bits but using only the least significant 48 bits and having the top 16 bits always 0 - the address type being 64 bits but using only the least significant 48 bits and having the top 16 bits always equal to the 48th bit - the address type being 48 bits
Clearly these types are all isomorphic, but they are definitionally different and in a way that really matters to the hardware.