Hacker News new | ask | show | jobs
by svat 1256 days ago
If you have only a couple of minutes to develop a mental model of floating-point numbers (and you have none currently), the most valuable thing IMO would be to spend them staring at a diagram like this one: https://upload.wikimedia.org/wikipedia/commons/b/b6/Floating... (uploaded to Wikipedia by user Joeleoj123 in 2020, made using Microsoft Paint) — it already covers the main things you need to know about floating-point, namely there are only finitely many discrete representable values (the green lines), and the gaps between them are narrower near 0 and wider further away.

With just that understanding, you can understand the reason for most of the examples in this post. You avoid both the extreme of thinking that floating-point numbers are mathematical (exact) real numbers, and the extreme of "superstition" like believing that floating-point numbers are some kind of fuzzy blurry values and that any operation always has some error / is "random", etc. You won't find it surprising why 0.1 + 0.2 ≠ 0.3, but 1.0 + 2.0 will always give 3.0, but 100000000000000000000000.0 + 200000000000000000000000.0 ≠ 300000000000000000000000.0. :-) (Sure this confidence may turn out to be dangerous, but it's better than "superstition".) The second-most valuable thing, if you have 5–10 minutes, may be to go to https://float.exposed/ and play with it for a while.

Anyway, great post as always from Julia Evans. Apart from the technical content, her attitude is really inspiring to me as well, e.g. the contents of the “that’s all for now” section at the end.

The page layout example ("example 7") illustrates the kind of issue because of which Knuth avoided floating-point arithmetic in TeX (except where it doesn't matter) and does everything with scaled integers (fixed-point arithmetic). (It was even worse then before IEEE 754.)

I think things like fixed-point arithmetic, decimal arithmetic, and maybe even exact real arithmetic / interval arithmetic are actually more feasible these days, and it's no longer obvious to me that floating-point should be the default that programming languages guide programmers towards.

1 comments

If you have even less time, just think of them as representing physical measurements made with practical instruments and the math done with analog equipment.

The common cause of floating point problems is usually treating them as a mathematical ideal. The quirks appear at the extremes when you try to to un-physical things with them. You can't measure exactly 0 V with a voltmeter, or use an instrument for measuring the distance to stars then add a length obtained from a micrometer without entirely losing the latter's contribution.

Thanks, I actually edited my post (made the second paragraph longer) after seeing your comment. The "physical" / "analog" idea does help in one direction (prevents us from relying on floating-point numbers in unsafe ways) but I think it brings us too close to the "superstition" end of the spectrum, where we start to think that floating-point operations are non-deterministic, start doubting whether we can rely on (say) the operation 2.0 + 3.0 giving exactly 5.0 (we can!), whether addition is commutative (it is, if working with non-NaN floats) and so on.

You could argue that it's "safe" to distrust floating-point entirely, but I find it more comforting to be able to take at least some things as solid and reason about them, to refine my mental model of when errors can happen and not happen, etc.

Edit: See also the floating point isn’t “bad” or random section that the author just added to the post (https://twitter.com/b0rk/status/1613986022534135809).

> whether we can rely on (say) the operation 2.0 + 3.0 giving exactly 5.0 (we can!)

Can we rely upon 2.3 + 2.3 giving exactly 4.6 though?

Can we rely upon LargeInt.0 + LargeInt.0 giving exactly 2xLargeInt.0 for all integers?

That's exactly my point: when you internalize the diagram, you'll be able to reason confidently about what happens:

• In the case of "2.3 + 2.3", each "2.3" is “snapped” to the nearest representable value (green line in the diagram), then their sum snapped to the nearest green line. In this case, because the two summands are equal and we're using binary floating-point, the result will also be a green line. If you knew more about the details of binary64 aka float64, you could confidently say that "2.3" means 2.29999995231628417969 (https://float.exposed/0x40133333) and be sure of what "2.3 + 2.3" would give (4.59999990463256835938 = https://float.exposed/0x40933333) and that this is indeed the closest representable value to 4.6 (so yes, we can rely on 2.3 + 2.3 giving the same value as what “4.6” would be stored as, i.e. "2.3 + 2.3 == 4.6" evaluating to True), but even without learning the details you can go pretty far. For instance, you know you can rely on "x + x" and "2 * x" giving the same value for any (non-NaN) value x.

• I already gave the example of 100000000000000000000000.0 + 200000000000000000000000.0 ≠ 300000000000000000000000.0 above, but for the specific case of "x + x" and "2 * x" yes we can rely upon them evaluating to the same value (unless 2*x is Infinity or NaN), though of course the large integer x may itself not be representable exactly. Again, with the mental model, you'll be in a better position to state what you expect by "exactly".

> Again, with the mental model, you'll be in a better position to state what you expect by "exactly".

I'm long in the tooth, if I want exactly I'll use a symbolic algebra program (such as Cayley|Magma).

I don't expect exactly when using floats (or doubles, etc) and my mental model includes the image of the "fine teeth of coverage" having uneven spacing, that the float graticule across the reals is uneven and at best semi quasi regular has largely been my greatest issue (in geophysical computation engine development).

> If you knew more about the details of binary64 aka float64, you could confidently say that "2.3" means 2.29999995231628417969 […]"

Then I think writing

    let f = 2.3;
Should be a compile error. The compiler should force you to write the “snapped” value in order to not mislead. :)
Now try writing the "snapped" value for 2.3 as a finite decimal. :-)