Hacker News new | ask | show | jobs
by jepler 3364 days ago
The operator overlooks my pet peeve when it comes to comparing floating-point numbers: in C++ the "standard associative containers" (std::set and std::map) are based on ordering using an less-than relationship which must be a "strict weak ordering". Many times, the methods suggested for comparing floating point types do not satisfy the requirements of "strict weak ordering", and in this case the C++ standard says you've entered the realm of undefined behavior. In the code at $DAY_JOB, "undefined behavior" turned out in this case to include such pleasant side-effects as double frees(!).

Specifically: when you have a less-than relationship "<", then !(a<b) && !(b<a) implies that a and b are equal (a==b). And if a==b and b==c then it must be the case that a==c, or the requirements of the ordering predicate are not met. Unfortunately, under most of these FP comparison schemes, for numbers a and b that are "close but not too close", it's the case that a<b, but for x=(a+b)/2, a==x and x==b!

3 comments

This is actually caused by the use of the x87 80 bit floating point registers. (Infamous GCC bug #323)

When the float is first inserted into the set/map it has 80 bit precision, but that gets truncated to float or double precision during the store. This breaks the ordering as you are saying, but it's not an inherent flaw with floats as such.

The problem goes away if you compile with -mfpmath=sse because then the math will be performed in the same precision as the storage format.

Bug #323 is responsible for a huge amount of mistrust of floats that they don't deserve. Other compilers don't have this problem because they truncate the floats before any comparison.

Yes, though I'm specifically talking about when you decide to define a 'bool less_than(double, double)' that uses some kind of fuzzy comparison approach internally. This can affect any platform, not just one with the "bug #323" behavior in it.
You don't need (and shouldn't do) any kind of fuzzy comparison for less than for floats.
You sure might imagine you need to. For instance, suppose you want to average pieces of data that are timestamped "almost the same", but could arrive to be processed with varying delays, including out of order arrival (so you can't just say "is this datum at 'about the same time as' the datum received just prior"). There are better approaches, but the one I inherited involved using a std::map which used a fuzzy less than as the ordering predicate; and my main task was to diagnose why, once in a blue moon, a segmentation fault could occur when doing some operation on the map (insertion, I think).
It seems more logical to pre-quantise the timestamps before insertion. (Actual solution I've seen practiced)
As bug #323 points out, truncating floats is an incomplete solution and brings its own problems. The GNU people were not simply being lazy or ignorant, their approach was valid.

The "bug" is x87 design, period. And x87 is the past. It's old, and bad. SSE and IEEE 754 is the present.

> Specifically: when you have a less-than relationship "<", then !(a<b) && !(b<a) implies that a and b are equal (a==b). And if a==b and b==c then it must be the case that a==c, or the requirements of the ordering predicate are not met.

¬(a<b) ∧ ¬(b<a) → a=b is not, in fact, a requirement on <. Rather, the point is that behind the scenes, any two elements satisfying ¬(a<b) ∧ ¬(b<a) are treated as equivalent by these containers.

To see the difference, consider the (somewhat counter-intuitive) behavior of NaNs: for any two NaNs m, n, we have ¬(m<n) and ¬(n<m), yet also m≠n. If that implication were an actual requirement, then the usual ordering < on floats would not be a suitable ordering predicate. What happens, though, is that the containers will implicitly treat NaNs as equivalent, i.e., the notion of equivalence the container uses for the elements depends only on < and might not coincide with the usual ==.

Ah! fond memories, related one of the most vexing and entertaining bug that I discovered in my code.

I was using C++'s std::sort and soon enough things would go wrong. It would crash at unpredictable times, I suspected I was corrupting memory somewhere. Parts of my data structures would get overwritten by parts from some other data structure. I checked and checked and checked my code, nothing seemed wrong.

Its only after opening the covers, when I started peering into the sort that I realized my mistake. I was passing the comparison operator that was a "less than equal" relation.