|
|
|
|
|
by dzaima
498 days ago
|
|
"doesn't affect other things" is in no way ever whatsoever a reason to do anything in any way related to thinking, believing, or hoping that it might, would, or should affect nothing, especially around compiler optimizations (fun sentence to write, and one I violate myself for some things, but trivially true regardless). I'd be curious to hear about actual issues though. The ppc64 case looks like llvm very intentionally not using the existing square root instruction, instead emitting a sequence of manual operations that supposedly run faster. And it's entirely in its right to do so, and it should affect no correct code (not that it's even really possible to write "correct code" under -ffast-math). |
|
Some broader context is probably warranted though. This originated out of a discussion with the authors of P3375 [0] about the actual performance costs of reproducibility. I suspected that existing compilers could already do it without a language change and no runtime cost using inline assembly magic. This library was the experiment to see if I was full of it.
There were only a few minor limitations it found. One was this issue, which happens "outside" what the library is attempting to do (though potentially still fixable as your godbolt link demonstrated). Another was that Clang and GCC have slightly different interpretations of "creative" register constraints. Clang's interpretation is closer to GCC's docs than GCC itself, but produces worse code.
Otherwise, this gives you reproducibility right up to the deeper compiler limitations like NaN propagation at essentially no performance cost. I wasn't able to find any "real" cases where it's not reproducible, only incredibly specific situations like this one across all 3 major compilers and even the minor ones I tried.
[0] https://isocpp.org/files/papers/P3375R2.html