Hacker News new | ask | show | jobs
by commandlinefan 2241 days ago
Sure, I had to wrestle with this a bit myself: to make it easier, imagine a 16-bit floating point format (P&H call this the "Nvidia format", but I can't find that documented anywhere but there): 1-bit sign, 5-bit (biased) exponent, 10-bit mantissa. One thing that TFA leaves out about floating point mantissas is that there's an implicit leading 1, so a 10-bit mantissa of 1111100000 would be interpreted as (binary) 1.1111100000 or 1 + 2^-1 + 2^-2 + 2^-3 + 2^-4 + 2^-5 = 1.96875. So, take the mantissa, convert it to a fraction, add 1, and then raise it to the power of the (biased) exponent.

So now, take the 16-bit pattern (0000 0011 1110 0000); you get a 0 sign bit, an exponent of 0, and a mantissa of 1.96875. So, with a bias of -15, that's 1.96875 * 2 ^ 0-15 = 0.00006008148193359375.

As you creep up to the "next" exponent, you see that the boundaries are respected. The last 2^-15 number is 0000 0011 1111 1111 (0x3ff) and the first 2^-14 number is 0000 0100 0000 0000 (0x400); now the exponent has changed from 0 to 1, but the floating point converts to 1.999023 * 2 ^ -15 = 0.0000610053539276123046875 and 1.000000 * 2 ^ -14 = 0.00006103515625: a bit-by-bit comparison has 0x3ff < 0x400. (This would be true regardless of bias).

Now imagine that IEEE 754 stored exponents in two's-complement format instead; the exponent 01111 would be interpreted as +31, but the "next" exponent, bit-wise, would be 10000 = -32. This means that you'd end up with 0011111111111111 = 1.999023 * 2 ^ 31 = 4292869204.475904, but the next binary number, 0100000000000000 would be 1.0 * 2 -32 = 0.00000000023283.

1 comments

> imagine a 16-bit floating point format (P&H call this the "Nvidia format", but I can't find that documented anywhere but there): 1-bit sign, 5-bit (biased) exponent, 10-bit mantissa

According to Wikipedia (https://en.wikipedia.org/wiki/Half-precision_floating-point_...), this is the IEEE 754 standard binary16 format.

Sort of - I skipped over subnormal numbers.