Hacker News new | ask | show | jobs
by cogman10 828 days ago
Well, what's fun is that (AFAIK) trigonometric functions tend not to be implemented in the newer floating point instructions, such as AVX or SSE.

So while what you say is true about the x87 implementation of those functions, for anything targeting a machine built in the last 20 years it's likely the code will run consistently regardless the architecture (barring architecture floating point bugs, which aren't terribly uncommon in the less significant bits and when overclocking comes into play).

x86 compilers won't use x87 instructions when SSE2 and later are available. x87 is just a really weird and funky instruction set that's best left in the gutter of history.

4 comments

Sadly even SSE vs. AVX is enough to often give different results, as SSE doesn't have support for fused multiply-add instructions which allow calculation of a*b + c with guaranteed correct rounding. Even though this should allow CPUs from 2013 and later to all use FMA, gcc/clang don't enable AVX by default for the x86-64 targets. And even if they did, results are only guaranteed identical if implementations have chosen the exact same polynomial approximation method and no compiler optimizations alter the instruction sequence.

Unfortunately, floating point results will probably continue to differ across platforms for the foreseeable future.

That's a bit of a different problem IMO.

Barring someone doing a "check if AVX is available" check inside their code, binaries are generally compiled targeting either SSE or AVX and not both. You can reasonably expect that the same binary thrown against multiple architectures will have the same output.

This, of course, doesn't apply if we are talking about a JIT. All bets are off if you are talking about javascript or the JVM.

That is to say, you can expect that a C++ binary blob from the Ubuntu repo is going to get the same numbers regardless the machine since they generally will target fairly old architectures.

> Barring someone doing a "check if AVX is available" check inside their code

Afaik that is exactly what glibc does internally

GCC won't use FMA without fast-math though. Even when AVX is otherwise enabled.
Sure it will:

> -ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them

> The default is -ffp-contract=off for C in a standards compliant mode (-std=c11 or similar), -ffp-contract=fast otherwise.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#ind...

Oh, wow, forgot about fp-contract. It says it is off in C by default, what about C++?
Read closer, it defaults to fast, not off
I would have expected to be a bug in the documentation? Why would they turn FMA off for standard compliant C mode, but not for standard compliant C++ mode?

But the documentation does appear to be correct: https://godbolt.org/z/3bvP136oc

Crazy.

it defaults to off for standard-compliant mode. Which in my mind was the default mode as that's what we use everywhere I have worked in the last 15 years. But of course that's not the case.

In any case, according to the sibling comment, the default is 'fast' even in std-compliant mode in C++, which I find very surprising. I'm not very familiar with that corner of the standard, but it must be looser than the equivalent wording in the C standard.

> x87 is just a really weird and funky instruction set that's best left in the gutter of history

hmmm, can you use the long doubles in sse or avx? They are glorious, and as far as I see from playing with godbolt, they still require dirtying your hands with the x87 stack.

The 80bit float? Not as far as I'm aware. However, it's fairly trivial to represent a 127bit float with 2 64bit floats. And with the nature of AVX/SSE, you don't really take much of a performance hit for doing that as you are often operating on both parts of the double with the same instruction.
Do you know if there's language support for that? Are there obscure gcc options that make "long double" be quadruple precision floats?
You can just use standard C _Float128 type https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html
Which language?

For C++, there's this: https://en.cppreference.com/w/cpp/types/floating-point

They do, however, have some intrinsics for trig functions in AVX in their compilers. Not as good as having an instruction of course.
What about GPU ISAs?