|
Really cool solution! One question: maybe I missed it, but there's no technical reason the tag bits could not use the entire range of exponent bits, no? Other than the fact that having up to 2048 tags would be ridiculously branchy, I guess. Here's a variation I just thought of, which probably has a few footguns I'm overlooking right now: use the seven highest exponent bits instead of three. Then we can directly read the most significant byte of a double, skipping the need for a rotation altogether. After reading the most significant byte, subtract 16 (or 0b00010000) from it, then mask out the sign. To test for unboxed doubles, test if the resulting value is bigger than 15. If so, it is an unboxed double, otherwise the lower four bits of the byte are available as alternative tags (so 16 tags). Effectively, we adjusted the exponent range 2⁻⁷⁶⁷..2⁻⁵¹¹ into the 0b(0)0000000 - 0b(0)0001111 range and made them boxed doubles. Every other double, which is 15/16th of all possible doubles, is now unboxed. This includes subnormals, zero, all NaN encodings (so it even can exist in superposition with a NaN or NuN tagging system I guess?) and both infinities. To be clear, this is off the top of my head so maybe I made a few crucial mistakes here. |
There are many reasons why 3-bit tags work well in practice. Importantly, it allows aligning heap objects on 64-bit machine words. Dereferencing a tagged pointer can then be done in a single machine instruction, a MOV offset by the tag.
One of our goals is to make self-tagging as straightforward to implement in existing systems as possible. Thus requiring to move away from the ubiquitous 3-bit tag scheme is definitely a no-go.
Another goal is performance, obviously. It turns out that if the transformation from self-tagged float to IEEE754 float requires more than a few (2-3) machine instructions, it is no longer as advantageous. Thus the choice of tags 000, 011 and 100, which encoding/decoding is a single bitwise rotation.
Also keep in mind that assigning more tags to self-tagging to capture floats that are never used in practice just adds a strain on other objects. That's why we include a usage analysis of floats to guide tag selection in our paper.
In fact, we are currently working on an improved version that uses a single tag to capture all "useful" values. Capturing 3/8 of floats seems unnecessary, especially since one of the tag is only meant to capture +-0.0. The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.
But in the end, it feels like we are barely scratching the surface and that a lot of fine-tuning can still be done by toying with tag placement and transformation. But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.