| Hi, I'm one of the authors of the paper. Thanks for your questions and comments! There are many reasons why 3-bit tags work well in practice. Importantly, it allows aligning heap objects on 64-bit machine words. Dereferencing a tagged pointer can then be done in a single machine instruction, a MOV offset by the tag. One of our goals is to make self-tagging as straightforward to implement in existing systems as possible. Thus requiring to move away from the ubiquitous 3-bit tag scheme is definitely a no-go. Another goal is performance, obviously. It turns out that if the transformation from self-tagged float to IEEE754 float requires more than a few (2-3) machine instructions, it is no longer as advantageous. Thus the choice of tags 000, 011 and 100, which encoding/decoding is a single bitwise rotation. Also keep in mind that assigning more tags to self-tagging to capture floats that are never used in practice just adds a strain on other objects. That's why we include a usage analysis of floats to guide tag selection in our paper. In fact, we are currently working on an improved version that uses a single tag to capture all "useful" values. Capturing 3/8 of floats seems unnecessary, especially since one of the tag is only meant to capture +-0.0. The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values. But in the end, it feels like we are barely scratching the surface and that a lot of fine-tuning can still be done by toying with tag placement and transformation. But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones. |
> The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.
Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment, putting the one with zero in the high side? The main difference being that you stick to the upper four bits of the exponent, and that I suggested using one of the upper three bit sequences to mark bits 4-5-6-7 as tag bits? (I mistakenly included the sign bit before and claimed this unboxes 15/16ths of all doubles, it's actually 7/8ths) Which probably has consequences for minimizing instructions, like you mentioned.
> But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.
I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs in certain types of code; I have seen production code where it is known that initiated values are always finite, so either of those two are then used as ways to "tag" a float value as "missing" or something to that degree.
Anyway, looking forward to future results of your research!