Hacker News new | ask | show | jobs
by titzer 1386 days ago
I don't think flipping these flags is expensive. Can you provide a source for that? AFAICT modern microarchitectures are going to register-rename that into the u-ops issued to the functional units, rather than flush the entire ROB.
1 comments

https://www.agner.org/optimize/instruction_tables.pdf, search for MXCSR (LDMXCSR and STMXCSR instructions).

Keep in mind that twiddling these flags is going to require saving the MXCSR register to memory, or'ing or and'ing bits in memory, and then reading that memory back into MXCSR. And both saving and reading the MXCSR requires stalls, because floating point operations both read and write that register. So you require, minimum, 4 L1 cache hits and 2 partial pipeline flushes to twiddle a MXCSR bit.

(As far as I'm aware, modern microarchitectures generally don't register-rename the floating-point status register.)

Looks like many x86 cores still do rename MXCSR, though Gracemont notably doesn't: https://chipsandcheese.com/2021/12/21/gracemont-revenge-of-t...

Note that you wouldn't necessarily need to do a read-modify-write -- it'd suffice in most cases to just to save the old value and then reset the whole MXCSR for the scope requiring special treatment.

Also worth noting that it's not the entire MXCSR that needs to be renamed, but just a handful of status bits, so the logic is likely even cheaper than renaming a GPR.