Hacker News new | ask | show | jobs
by coldtea 2490 days ago
Seems to be one of the best ways to go about it.

From the comment in protobuf source (which does the same thing as Python), mentioned in the Twitter thread:

(...) An arguably better strategy would be to use the algorithm described in "How to Print Floating-Point Numbers Accurately" by Steele & White, e.g. as implemented by David M. Gay's dtoa(). It turns out, however, that the following implementation is about as fast as DMG's code. Furthermore, DMG's code locks mutexes, which means it will not scale well on multi-core machines. DMG's code is slightly more accurate (in that it will never use more digits than necessary), but this is probably irrelevant for most users.

Rob Pike and Ken Thompson also have an implementation of dtoa() in third_party/fmt/fltfmt.cc. Their implementation is similar to this one in that it makes guesses and then uses strtod() to check them. (...)

https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb3...

2 comments

>Seems to be one of the best ways to go about it.

The C/C++ standards do not require formatting to round correctly or even be portable. I recently had an issue where a developer used this method to round floats for display, and there were differences on PC and on Mac. It literally rounded something like 18.25 to 18.2 on one platform and 18.3 on the other. This led to all sorts of other bugs as some parts of the program used text to transmit data, which ended up in weird states.

The culprit was this terrible method. If you want anything approaching consistency or predictability, do not use formatting to round floating point numbers. Pick a numerically stable method, which will be much faster of done correctly.

Coincidentally, C/C++ do not require any of their formatting and parsing routines to round-trip floating point values correctly (except the newly added hex formatted floats which are a direct binary representation, and some newly added function allowing an obscure trick I do not recall at the moment... )

> The C/C++ standards do not require formatting to round correctly or even be portable.

The linked-to method uses PyOS_snprintf(). Its documentation at https://docs.python.org/3/c-api/conversion.html says:

"""PyOS_snprintf() and PyOS_vsnprintf() wrap the Standard C library functions snprintf() and vsnprintf(). Their purpose is to guarantee consistent behavior in corner cases, which the Standard C functions do not."""

And those functions in C/C++ are not specified.

The python wrapper also does not specify in this case, so you should not use them for rounding, or you will have the same problem. No where on that page does it specify proper rounding will be cross platform.

Simply do it with floats. There are perfectly good, numerically stable, fast rounding methods, that avoid all this nonsense.

response was to the fact that the comment said the method of format strings was one of “the best ways to go about it”

It’s obvious that PyOS_snprintf is not a standard library function

Sure, PyOS_snprintf isn't a standard library function, but it's a thin wrapper to snprintf, which is. Python/mysnprintf.c is the location of the:

   snprintf() wrappers.  If the platform has vsnprintf, we use it, else we
   emulate it in a half-hearted way.  Even if the platform has it, we wrap
   it because platforms differ in what vsnprintf does in case the buffer
   is too small:
It mentions that one corner cases what happens when the buffer is too small. Not rounding issues.

The "the best ways to go about it" comment links to the protobuf code, which also uses snprintf.

The (what I think is the) relevant C99 spec at http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf says:

> For e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most DECIMAL_DIG, then the result should be correctly rounded. If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros. Otherwise, the source value is bounded by two adjacent decimal strings L<U, both having DECIMAL_DIG significant digits; the value of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that the error should have a correct sign for the current rounding direction.

So either 1) "The C/C++ standards do not require formatting to round correctly or even be portable.", in which case Python and protobuf are doing it wrong and somehow this issue was never detected, or 2) The C/C++ standards do require correct rounding, but the case described by ChrisLomont didn't quite meet the spec requirements to get precision and rounding modes to match across platforms. Or 3), I don't know what I'm talking about.

"correctly rounded" is implementation defined is the problem. You cannot do it portably, and you cannot query it portably. As such, different platforms, compilers, etc do it differently. Thus using formatting for rounding is inconsistent.

Here's [1] where you can query the current floating-point environment in C: "Specifics about this type depend on the library implementation".

Here's [2] where you can set some rounding modes in C++: "Additional rounding modes may be supported by an implementation.". Note this does not have by default bankers rounding which is used to make many scientific calculations more stable (lowers errors and drift in accumulated calculations). Many platforms do this by default, but it's not in the standard.

You can chase down this rabbit hole. I (and several others) did during the issue on the last project, and got to where it was well-known in numerics circles that this is not a well-defined process in C/C++. If it were, printing and parsing should round-trip, and it does not before a recent C++ addition, and now it only is guaranteed in a special case.

[1] http://www.cplusplus.com/reference/cfenv/fenv_t/

[2] https://en.cppreference.com/w/cpp/numeric/fenv/FE_round

Thank you for the clarification!
They document their goal is correctness in edge cases that other standard C functions don’t guarantee. Seems obvious to say this is “one of the best” ways to go about it. It’s absolutely true given this stated and documented goal of correctness, which would be a very commonly needed property.

Other good ways could trade-off edge case comprehensiveness for performance or whatever. That doesn’t make this way less good.

IEEE754 defines 5 rounding modes. This one sounds like nearest, ties to even. Not all decimal values are representable as float. Depending on if you compile for x87 (80-bit internal repreaentation) or SSE (64-bit) you might get slightly different results.
I'm well aware of that, having written at length about floating-point tricks, numerical issues, etc.

The issue here is you don't know what a library that formats a float does, and is the function is not specified clearly (as in C/C++), you have zero way of knowing what you will get.

Thus I said to do it yourself, using proper numerics.

They already had the same float, though. That's not very likely if the rounding modes were different.

For what it's worth, it looks like different standard libraries make different choices on whether float->string conversion cares about the current rounding mode.

At some point in 3.x Python moved to "bankers rounding" which is slightly less biased than the one we learn in school, perhaps C++ did the same. Might be a factor in the discrepancy.
Yep, I'd be curious what any better alternative is.

Consider that a float's internal representation is in base 2, and you're trying to round in base 10. Even if you didn't use a string, I'd assume you'd have to create an array of ints that contain base 10 digits in order to do the rounding, unless there are some weird math tricks that can be employed that can avoid you having to process all base 2 digits. And an array of ints isn't all that computationally different from a string.

In fact, I don't know what cdecimal now does but back when decimal.Decimal was pure python it would store the "decimal number" as a string and manipulate that.