Hacker News new | ask | show | jobs
by wolfgke 3151 days ago
> If it's exactly reproducible on different hardware and software configurations, as seems to be one of the goals, that alone would be a useful improvement on IEEE754.

IEEE754 already supports that (except for the minor fact that the NaN types can in theory be represented by different bit sequences - in practise all the same are used). The problem rather is that typical programming languages (such as C, C++) have no proper support for IEEE754.

And you have to be very precise in your intentions: For each of the operations defined in IEEE754 you have to define which of the four (+ one optional one in the 2008 revision) rounding modes is to be used. Or you have to be precise whether a MAC or and multiply (then round) and add (then round again) is to be used. This of course has again implications that depending on what the compiler makes out of it either the program will not run on processors with no MAC support or the code has to use a software emulation (slow).

3 comments

> except for the minor fact that the NaN types can in theory be represented by different bit sequences - in practise all the same are used

I can’t say it’s terribly common, but there’s always someone like me out there going “Ooh, 51 free bits! Don’t mind if I do.” It’s a common trick for value representation in implementations of dynamic languages.

Anyway, poor floating-point support is about as prevalent as poor Unicode support—the common cases seem to work, lulling you into a false sense of security before you discover that the edge cases are untested. I’ve seen bugs caused by a green thread getting rescheduled onto a different OS thread with a different rounding mode.

There are a few other very minor details that can vary between platforms:

- In binary floating-point, implementations are allowed to "detect tininess" either "before" or "after rounding". ARM and PPC detect it before rounding, x86 detects it after. This only changes whether or not the underflow flag is set for results in a tiny 1/4-ulp-wide interval, and only effects multiplication, fma, and conversion (results from the other basic operations cannot land in that interval). Since almost no one cares about flags, this is not a big deal; if flush-to-zero is enabled, it will perturb results that land in this band, however.

- implementations are allowed to set or not set the invalid flag for fma(0, inf, quiet nan). Again, almost no one cares about flags, so no problem, but if invalid is unmasked, this effects whether or not you trap.

The bigger issue, as you say, is that C/C++ leave the width of intermediate expression evaluation up to the implementation (but the compiler has to say what it does via FLT_EVAL_METHOD, so you can refuse to compile if the compiler doesn't do what your program needs).

Thanks for these details about the flags. I really was not aware that there exist differences.
Sure, I meant “relatively easily reproducible in portable C code”.

I may be wrong, but it’s my understanding that even given the precise rounding modes, fusing etc, the results could still differ as implementations are allowed to use varying extended precision internally (and do).

For example, as far as I know it’s difficult to guarantee identical FPU results on x86 and ARM.

[Edit to add: I guess I'm complaining about the popular implementations rather than the IEEE spec itself, but for ordinary users like me it amounts to the same thing. Overall IEEE 754 is wonderful, so it's exciting to see a proposal for something even better!]

> For example, as far as I know it’s difficult to guarantee identical FPU results on x86 and ARM.

Can you give details/ressources on how it is difficult to obtain identical FPU results on x86 and ARM?

Does this even hold if you program in assembly using either

- only the primitives that are defined exactly in IEEE 754:2008 (i.e. not some functions defined in some, say, C library)

or

- using "identical" implementations of more complex functions (i.e. not the IEEE 754 primitives; think of cos, erf, gamma, ...)?

Can you give details/ressources on how it is difficult to obtain identical FPU results on x86 and ARM?

Not personally -- having already run into problems getting reproducible results on a single x86 machine with different compilers, I haven't even tried getting ARM to match!

Here's a long list of links on various issues: https://gafferongames.com/post/floating_point_determinism/

And here's a blog post that goes into detail on how to get reproducible results on x86: http://yosefk.com/blog/consistency-how-to-defeat-the-purpose... It's not too bad, but it sounds a bit fragile and I have no idea how well it would translate to other architectures. [Edit to add: hmm, actually, that post does say this is mostly just an x86 problem, or rather x87]

using "identical" implementations of more complex functions

That's a rather onerous requirement! Especially for multi-platform work. In almost all cases I'd like to be able to use the platform's math library, which is presumably well-optimized. Is there a good, reasonably efficient, highly portable implementation of math.h that gives fully reproducible results?

It seems to me (but I'd love to be convinced otherwise!) that if you really want reproducible results, fixed-point is a much better road to go down. Ints are just a lot more consistent than floats on pretty much every platform with a C compiler.