| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vanderZwan 573 days ago

Really cool solution! One question: maybe I missed it, but there's no technical reason the tag bits could not use the entire range of exponent bits, no? Other than the fact that having up to 2048 tags would be ridiculously branchy, I guess.

Here's a variation I just thought of, which probably has a few footguns I'm overlooking right now: use the seven highest exponent bits instead of three. Then we can directly read the most significant byte of a double, skipping the need for a rotation altogether.

After reading the most significant byte, subtract 16 (or 0b00010000) from it, then mask out the sign. To test for unboxed doubles, test if the resulting value is bigger than 15. If so, it is an unboxed double, otherwise the lower four bits of the byte are available as alternative tags (so 16 tags).

Effectively, we adjusted the exponent range 2⁻⁷⁶⁷..2⁻⁵¹¹ into the 0b(0)0000000 - 0b(0)0001111 range and made them boxed doubles. Every other double, which is 15/16th of all possible doubles, is now unboxed. This includes subnormals, zero, all NaN encodings (so it even can exist in superposition with a NaN or NuN tagging system I guess?) and both infinities.

To be clear, this is off the top of my head so maybe I made a few crucial mistakes here.

2 comments

omelancon 573 days ago

Hi, I'm one of the authors of the paper. Thanks for your questions and comments!

There are many reasons why 3-bit tags work well in practice. Importantly, it allows aligning heap objects on 64-bit machine words. Dereferencing a tagged pointer can then be done in a single machine instruction, a MOV offset by the tag.

One of our goals is to make self-tagging as straightforward to implement in existing systems as possible. Thus requiring to move away from the ubiquitous 3-bit tag scheme is definitely a no-go.

Another goal is performance, obviously. It turns out that if the transformation from self-tagged float to IEEE754 float requires more than a few (2-3) machine instructions, it is no longer as advantageous. Thus the choice of tags 000, 011 and 100, which encoding/decoding is a single bitwise rotation.

Also keep in mind that assigning more tags to self-tagging to capture floats that are never used in practice just adds a strain on other objects. That's why we include a usage analysis of floats to guide tag selection in our paper.

In fact, we are currently working on an improved version that uses a single tag to capture all "useful" values. Capturing 3/8 of floats seems unnecessary, especially since one of the tag is only meant to capture +-0.0. The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.

But in the end, it feels like we are barely scratching the surface and that a lot of fine-tuning can still be done by toying with tag placement and transformation. But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.

link

vanderZwan 573 days ago

Thank you for the explanation! I was not aware of the context behind the choice of three-bit tags.

> The trick is to rotate the tag to bits 2-3-4 of the exponent instead of 1-2-3 and add an offset to the exponent to "shift" the range of captured values.

Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment, putting the one with zero in the high side? The main difference being that you stick to the upper four bits of the exponent, and that I suggested using one of the upper three bit sequences to mark bits 4-5-6-7 as tag bits? (I mistakenly included the sign bit before and claimed this unboxes 15/16ths of all doubles, it's actually 7/8ths) Which probably has consequences for minimizing instructions, like you mentioned.

> But I think the next step is a quality over quantity improvement: capture less floats but capture the right ones.

I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs in certain types of code; I have seen production code where it is known that initiated values are always finite, so either of those two are then used as ways to "tag" a float value as "missing" or something to that degree.

Anyway, looking forward to future results of your research!

link

omelancon 572 days ago

> Maybe I misunderstand, but isn't that a similar idea to what I just described? Adding an offset to "rotate" the ranges of the exponent by a segment...

Yes it is similar. It seems to me that there really isn't that many useful operations that can be applied to the exponent beside adding an offset. But that's only a suspicion, do not take my word for it.

> I suspect that ensuring NaN and Infinity are in there will be crucial to avoid performance cliffs...

This is a reasonable assumption. There are in fact ways to rotate and add an offset such that the exponent can overflows/underflows to capture exponents 0 and 0x7ff (for inf and nan) with a single well-positioned tag. Making it work in practice is not as simple, but we are working on it.

link

vanderZwan 571 days ago

Yeah, this is one of those situations where speculating about behavior one thing, but well-designed benchmarks and tests may very well give some surprising results. I am very much looking forward to the follow-up paper where you'll share your results :). Good luck with the research to you and you colleagues!

link

jacobp100 572 days ago

Just to add some JS engine info - every engine stores numbers as 32 bit integers if possible, which is usually done by tagging a pointer. Also, JSC needs pointers to be represented exactly as-is, because it will scan the stack for anything that looks like a pointer, and retain it when running the GC

link

funny_falcon 572 days ago

Just correction: CRuby uses “Float Self-Tagging” for years.

link

refulgentis 572 days ago

This is your 4th comment claiming this: it's not true.

You're right that Ruby uses tags, ex. Objective-C does also and has for a while.

The innovation here is its a tag without the tag bits. That's why its self-tagging, not tagging.

link

SonOfLilit 572 days ago

In another comment, this user links a CRuby commit that they claim adds it. It seems legit.

Linked commit contains code for rotating tagged floats so bits 60..62 go to the least significant positions, and a comment about a range of unboxed floats between 1.7...e-77 and 1.7...e77, plus special casing 0.0

e.g. this excerpt:

    #if USE_FLONUM
    #define RUBY_BIT_ROTL(v, n) (((v) << (n)) | ((v) >> ((sizeof(v) * 8) - n)))
    #define RUBY_BIT_ROTR(v, n) (((v) >> (n)) | ((v) << ((sizeof(v) * 8) - n)))
    
    static inline double
    rb_float_value(VALUE v)
    {
      if (FLONUM_P(v)) {
 if (v == (VALUE)0x8000000000000002) {
     return 0.0;
 }
 else {
     union {
  double d;
  VALUE v;
     } t;

     VALUE b63 = (v >> 63);
     /* e: xx1... -> 011... */
     /*    xx0... -> 100... */
     /*      ^b63           */
     t.v = RUBY_BIT_ROTR(((b63 ^ 1) << 1) | b63 | (v & ~0x03), 3);
     return t.d;
 }
      }
      else {
 return ((struct RFloat *)v)->float_value;
      }
    }

link

vanderZwan 573 days ago

Also, funny enough, this idea and variations of it look surprisingly easy to implement in JS itself using typed arrays. I won't any claims of impact on performance though...

You need a Float64Array for the values (plus a Uint32Array/Uint8Array using the same buffer to be able to manipulate the integer bits, which technically also requires knowledge of endianness since it's technically not in the JS spec and some funny mobile hardware still exists out there). Mocking a heap is easy enough: the "heap" can be a plain array, the "pointers" indices into the array. Using a Map<value, index> would you "intern" duplicate values (i.e. ensure doubles outside the unboxed range are only "heap" allocated once).

link