Hacker News new | ask | show | jobs
by bicsi 1338 days ago
The story is much worse than what is presented in the article, especially when talking about floating point errors that add (or rather multiply) up.

More often than not, the error is relative wrt the greatest magnitudes in the intermediary calculations. In essence, if you subtract two floating point numbers, you’re kinda screwed, because you cannot ever handle with good precision cases like A - B, where A and B are big enough numbers. Not to mention more complicated operations like trig functions.

In my opinion, one should avoid floating point as much as possible. And not only when testing for equality (all comparisons suffer from this).

Or, of course, ignore FPEs and proceed at your own risk.

1 comments

This is not the issue. Floating point numbers have problems when A and B differ greatly in magnitude, then A-B might easily be equal to A and so ((A-B)-A)LARGENUMBER can still be equal to zero, even if B was e.g. 1.

This is not a problem with floats. This is an inherent* result of the design constraints. You can not fix it and there are no alternatives without this flaw which offer any of the same benefits as floats.

Any operation between two floats is the floating point number which is closest to the result calculated as real numbers, that is the magic of floats.

Floating point numbers are a near magical type, which for many applications are not only a good choice, but the only one which makes any sense. It is near impossible to imagine modern engineering without them.

Indeed. The only way to accommodate numbers where the difference in order of magnitude exceeds the type's significand length is to not combine them, keep them separate and operate on them separately, combining them only for a sample.

One good common example is numerical integration. In any sufficient fine grained posteriori simulation, even with modest limits on position - delta velocity can be too small to preserve when adding to position, i.e when delta velocity is more than 2^52 smaller than position. Keeping a separate accumulator is the only way to handle this without arbitrarily increasing precision with software FP.