| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by AlotOfReading 498 days ago

Wasn't aware of __arithmetic_fence, though there's an open bug ticket noting that it doesn't protect against contraction (https://github.com/llvm/llvm-project/issues/91674). Still worth trying though. I was aware of GCC __builtin_assoc_barrier, but it wasn't documented to prevent contraction when I last checked. Appears they've fixed that since. Hadn't considered the IR / compilation speed issue. I'm aware of the broken optimizations, but they're not really a problem in practice as far as I can tell? You mainly lose some loop optimizations that weren't significant in the "serious" loop heavy numerics code I tested against at work.

P3375 is mainly about contraction, but there's other issues that can crop up. Intermediate promotion occasionally happens and I've also seen cases of intermediate expressions optimized down to constants without rounding error. Autovectorization is also a problem for me given the tendency of certain SIMD units to have FTZ set. I also have certain compilers that are less well-behaved than GCC and Clang in this respect.

My concern isn't accuracy though. Compilers do that fine, no need to second guess them. My hot take is that accuracy is relatively unimportant in most cases. Most code is written by people who have never read a numerical analysis book in their life and built without a full awareness of the compiler flags they're using or what those flags mean for their program. That largely works out because small errors are not usually detectable in high level program behavior except as a consequence of non-reproducibility. I would much rather accept a small amount of rounding error than deal with reproducibility issues across all the hardware I work on.

1 comments

dzaima 497 days ago

> there's an open bug ticket noting that it doesn't protect against contraction (https://github.com/llvm/llvm-project/issues/91674).

Huh. ¯\_(ツ)_/¯

I didn't really mean the loop thing as much of a problem for the goal of reproducibility (easy enough to just not explicitly request a vector math library).

aarch32 NEON does have an implicit FTZ, and, yeah, such are annoying; though gcc and clang don't use it without -ffast-math (https://godbolt.org/z/3b11dW559)

I do agree that getting consistent results would definitely make sense as the default.

link