| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by StefanKarpinski 1685 days ago
	That's true, but the danger of flushing subnormals to zero is correspondingly worse because it's global CPU state and there's commonly used code that relies on not flushing subnormals to zero in order to work correctly, like `libm`. The example linked in the post is of a case where loading a shared library that had been compiled with `-Ofast` (which includes `-ffast-math`) broke a completely unrelated package because of this. Of course, the fact that CPU designers made this a global hardware flag is atrocious, but they did, so here we are.

1 comments

Joker_vD 1685 days ago

Wait, what is "local" CPU state/hardware flag? In any case, since x64 ABI doesn't require MXCSR to be in any particular state on function entry, libm should set/clear whatever control flags it needs on its own (and restore them on exit since MXCSR control bits are defined to be callee-saved).

link

StefanKarpinski 1685 days ago

Local would be not using register flags at all and instead indicating with each operation whether you want flushing or not (and rounding mode, ideally). Some libms may clear and restore the control flags and some may not. Libm is just an example here and one where you're right that most of function calls that might need to avoid flushing subnormals to zero are expensive enough that clearing and restoring flags is an acceptable cost. However, that's not always the case—sometimes the operation in question is a few instructions and it may get inlined into some other code. It might be possible to handle this better at the compiler level while still using the MXCSR register, but if it is, LLVM certainly can't currently do that well.

link

simonbyrne 1685 days ago

In theory, every function should do that to check things like rounding mode etc. But that would be pretty slow, especially for low-latency operations (modifying mxcsr will disrupt pipelining for example).

link

pcwalton 1685 days ago

That wouldn't be practical. C math library performance really matters for numerical-intensive apps like games.

link