Hacker News new | ask | show | jobs
by ChrisSD 2490 days ago
Maybe I'm missing something but what's wrong with rounding floats this way?
6 comments

Rounding a number is, in the common case, multiplying it by some base, truncating to an integer, and dividing by the base. You do have to handle extremely high exponents, but even the logic for that is not complex.

Example of implementing it the sane way: https://github.com/numpy/numpy/blob/75ea05fc0af60c685e6c071d...

Every step of this function is complex and expensive, especially printing a float as a decimal is very complex. And round is routinely used in a tight loop.

The numpy approach sacrifices correctness for speed (you sometimes get unexpected results in some corner cases, see below), the cpython way sacrifices speed for correctness.

  >>> round(56294995342131.5, 2)
  56294995342131.5
  >>> round(56294995342131.5, 3)
  56294995342131.5

  >>> np.round(56294995342131.5, 2)
  56294995342131.5
  >>> np.round(56294995342131.5, 3)
  56294995342131.51
The problem is the division after truncation. That division by a power of 10 can produce errors in binary.
How does truncating a positive number ever round up?
Add 0.5 before rounding towards negative infinity, and you'll get standard rounding.
I don't know that it is "wrong", just unexpected. I suspect most people expect all math functions to be purely implemented in numerical terms, so finding string manipulation is surprising/interesting.
> I don't know that it is "wrong", just unexpected. I suspect most people expect all math functions to be purely implemented in numerical terms, so finding string manipulation is surprising/interesting.

You kind of got me thinking now. The decimal representation of a number is really a string representation (in the sense of a certain sequence of characters). Hence rounding to a certain decimal is essentially a string operation. You can of course do it by (say) dividing by 10^whatever or something else in some numerical fashion, but the more I think about it, the more natural it is to just think of the whole thing as a string.

Or you could flip it around and consider that the string manipulation can also be described numerically so whether you consider the operation as a string operation or a numerical operation is sort irrelevant. It's just a point of view.

I think the best way to think about it is as a symbolic representation. We have processes for manipulating symbols to achieve the correct results. Purely numeric (binary based) operations just happen to allow for some quicker shortcuts but sometimes lead go lost information.
This is one of the core ground/figure themes in Godel Escher Bach.
>The decimal representation of a number is really a string representation

It is incorrect to speak of "the" decimal representation of a number, as many numbers have non-unique decimal representations, the most famous example being 1.000...=0.999...

The definition that makes it unique is the shortest representation where trailing zeros are bout included. In your example that would be 1 note that this definition comes up in binary floating point where there are infinite decimal representations that will round back to a given binary64 float but the decimal representation chosen is the shortest (and closest to the binary64 float in the case of ties for length).
> It is incorrect to speak of "the" decimal representation of a number, as many numbers have non-unique decimal representations, the most famous example being 1.000...=0.999...

That's true and I'll concede that point, but it's not really relevant to what I said. That just means some numbers have different string representations that represent the same object. That doesn't really contradict anything in my post except the use of the definite article.

There are many corner cases involved with rounding, and the folks who did the string conversion had to put a lot of effort into handling all of them. It makes sense to piggyback on their efforts, even if it isn't the most efficient way 99% of the time.
And then strings are text implemented in numerical terms
Python already doesn't have the best performance. If you need to round a lot of floats in a loop you better bring some time.
Or use numpy, like the rest of us.
round() is specifically about rounding to decimal places, and there are other, faster functions for the more common cases
They're using base 10 which is much slower to use here than power of 2 bases would be. Plus then also the memory management for the string.
The memory management will virtually never kick in. You'd need a number which expands to more than 100 characters for that to happen.
The two concerns I have are performance and correctness. I don’t know enough about the implementation of round(3) to know... perhaps someone else does?
This approach is used specifically because of correctness. Doing things the 'obvious' way with round(3) or truncation introduces precision problems in corner cases.
That's what round(3) is for
round(3) can only round to an integer. Python's round works to an arbitrary decimal position.

It would previously scale up, round (ceil/floor really) then scale down. That turned out to induce severe precision issues: https://bugs.python.org/issue1869

That's the first thing I'd try, multiply by 10^digits, round, divide by 10^digits. Thanks for the link.