Hacker News new | ask | show | jobs
by StefanKarpinski 1685 days ago
That's true, but the danger of flushing subnormals to zero is correspondingly worse because it's global CPU state and there's commonly used code that relies on not flushing subnormals to zero in order to work correctly, like `libm`. The example linked in the post is of a case where loading a shared library that had been compiled with `-Ofast` (which includes `-ffast-math`) broke a completely unrelated package because of this. Of course, the fact that CPU designers made this a global hardware flag is atrocious, but they did, so here we are.
1 comments

Wait, what is "local" CPU state/hardware flag? In any case, since x64 ABI doesn't require MXCSR to be in any particular state on function entry, libm should set/clear whatever control flags it needs on its own (and restore them on exit since MXCSR control bits are defined to be callee-saved).
Local would be not using register flags at all and instead indicating with each operation whether you want flushing or not (and rounding mode, ideally). Some libms may clear and restore the control flags and some may not. Libm is just an example here and one where you're right that most of function calls that might need to avoid flushing subnormals to zero are expensive enough that clearing and restoring flags is an acceptable cost. However, that's not always the case—sometimes the operation in question is a few instructions and it may get inlined into some other code. It might be possible to handle this better at the compiler level while still using the MXCSR register, but if it is, LLVM certainly can't currently do that well.
In theory, every function should do that to check things like rounding mode etc. But that would be pretty slow, especially for low-latency operations (modifying mxcsr will disrupt pipelining for example).
That wouldn't be practical. C math library performance really matters for numerical-intensive apps like games.