Hacker News new | ask | show | jobs
by tomrod 2651 days ago
This article highlights and issue with floating point numbers, a substantial use case for data scientists (and as such, I value the input).

How do REPLs and databases handle this edge case?

4 comments

Almost always they ignore this kind of issue; the best you're likely to get is a mean() function that remembers to sort the input first. Most numbers are in a "human" range far from the limits.
If you when calculating an average actually reach overflow in a double without messing up the precision first and making the calculation worthless in the first place, some numbers in the list of numbers is bogusly big anyway.
Or the list is just very long.
Not really. The time required to overflow that way is unrealistic. Also I think you'll run into S + x = S. at that point your sum will stop climbing towards overflow.
Data science is squishy to begin with -- those wanting high performance are heading to fixed-point hardware-accelerated solutions where loss-of-precision is a given, so not having a fully accurate answer with high-precision floating-point along many steps to solving the problem doesn't seem like a big deal.
One option is to convert to a bignum representation, which is slow but works.