Hacker News new | ask | show | jobs
by jcranmer 1387 days ago
The problem here is that enabling FTZ/DAZ flags involves modifying global (technically thread-local) state that is relatively expensive to do. Ideally, you'd want to twiddle these flags only for code that wants to work in this mode, but given the relative expense of this operation, it's not entirely practicable to auto-add twiddling to every function call, and doing it manually is somewhat challenging because compilers tend to support accessing the floating-point status rather poorly. Also, FTZ/DAZ aren't IEEE 754, so there's no portable function for twiddling these bits as there is for other rounding mode or exception controls. I will note that icc's -fp-model=fast and MSVC's /fp:fast correctly do not link code with crtfastmath.

As a side note, this kind of thing is why I think a good title for a fast-math would be "Fast math, or how I learned to start worrying and hate floating point."

1 comments

I don't think flipping these flags is expensive. Can you provide a source for that? AFAICT modern microarchitectures are going to register-rename that into the u-ops issued to the functional units, rather than flush the entire ROB.
https://www.agner.org/optimize/instruction_tables.pdf, search for MXCSR (LDMXCSR and STMXCSR instructions).

Keep in mind that twiddling these flags is going to require saving the MXCSR register to memory, or'ing or and'ing bits in memory, and then reading that memory back into MXCSR. And both saving and reading the MXCSR requires stalls, because floating point operations both read and write that register. So you require, minimum, 4 L1 cache hits and 2 partial pipeline flushes to twiddle a MXCSR bit.

(As far as I'm aware, modern microarchitectures generally don't register-rename the floating-point status register.)

Looks like many x86 cores still do rename MXCSR, though Gracemont notably doesn't: https://chipsandcheese.com/2021/12/21/gracemont-revenge-of-t...

Note that you wouldn't necessarily need to do a read-modify-write -- it'd suffice in most cases to just to save the old value and then reset the whole MXCSR for the scope requiring special treatment.

Also worth noting that it's not the entire MXCSR that needs to be renamed, but just a handful of status bits, so the logic is likely even cheaper than renaming a GPR.