Hacker News new | ask | show | jobs
by Veserv 2241 days ago
For another visual explanation in words, floating point numbers (ignoring subnormal) are just a linear approximation of 2^x [1] where there is one piece for each integer (x = 4 to x = 5, etc). As an example, draw a straight line between 2^4 (16) and 2^5 (32). The floating point numbers in that range are evenly spaced on that line.

Another explanation using the window + offset terminology used in the post is that the offset is a percentage of the way through the window. So, for a window of 2^x, the difference between an offset of y and y + 1 is 2^(x-23) or 1/2^(-23) of 2^x. Put another way, floating point numbers do not have absolute error like integers (each number is within 1 of a representable value), but % error (each number is within 1/2^(-23) of a representable value). Essentially, floating point numbers use % error bars instead of absolute error bars.

Using this model you can even see how to create your own floating point numbers. Just pick a % precision you want, for single FP that is 1/2^(-23) and double FP 1/2^(-52), that defines the range of your mantissa (offset). Then pick a range of x values you want to represent, that is the range of your exponent (window).

As an aside, subnormal numbers do not respect this principle. They extend the expressible range for very small numbers by sacrificing % error for those numbers. In the very worst case of the smallest subnormal number you can get 25% error (it might actually be 50%). As might be imagined, this plays havoc on error propagation since if you ever multiply by a number that just so happens to be the smallest subnormal, all your multiplies might suddenly be off by a factor of 25% instead of the normal 100 * 2^(-23)% which is 2,000,000 times the % error which is quite a bit harder to compensate for. This is why many people consider subnormals to be a blemish.

[1] The approximation is actually offset in the x direction for the bias. If you want to be more accurate, you are actually graphing 2^(x - 127).

1 comments

On the other hand, if you don't have subnormals, then you have the funny property that subtracting two inequal numbers would yield 0. This never happens with subnormals because they allow representing numbers closer to zero, below the smallest representable exponent. The numerical stability of some algorithms crucially relies on subnormals.
I do not understand your statement. I can come up with two possible interpretations, but neither seem true to me. Can you provide an example of what you mean?

I am also interested in your statement that certain algorithms depend on subnormals as defined in the floating point spec. Can you provide an example of such an algorithm? I can intuit how it might be desirable to have a single "too small/epsilon" value, but I do not see offhand how you can leverage the full range of subnormals in any reasonably generic way that is not extremely dependent on the specific problem and scale (i.e. multiply all numbers by 2^30 and increase the maximum exponent by 30, do you get the same output multiplied by 2^30), so I would like to see how it is done.

In terms of my two possible interpretations, the first is that subtracting two unequal floating point numbers yield 0. I am pretty sure this is not the case. It may yield no change, but I am pretty sure it can not yield 0. The other is that subtracting two unequal arbitrary precision numbers represented as floating point numbers yield 0. This is true, but is a known limitation of emulating arbitrary precision arithmetic using limited precision and must always be accounted for. If this is what you meant, then we can just choose numbers too small to be expressed by subnormals to cause the problem to occur again, so all it does is allow handling of a few more cases at the cost of complexity and non-uniformity. If you did not mean either of these two interpretations, can you explain what you meant, preferably with a concrete example?

If x and y are finite floats (not NaN or ±Infinity), with x != y, then x - y != 0. This is true for any pair of floats only because of denormal numbers. If there were no denormals (or denormals were flushed to zero), then if |x - y| < 2^-127 then x - y would become zero.