Hacker News new | ask | show | jobs
by stephencanon 530 days ago
There are basic integer operations in the FP/SIMD units on most CPUs, so there’s no generally need to “move back and forth” unless you need to branch on the result of a comparison, use a value as an address, or do some more specialized arithmetic.
2 comments

On x86 there is sometimes (depending on the specific microarchitecture) an extra cycle additional latency when using an integer operation on a xmm register last used with a float operation.

I have seen it explained as the integer and foat ALUs's being physically distant and the forwarding network needing an extra cycle to transport the operands.

This is correct, but it’s happily pretty rare for it to matter in practice (because the domain bypass penalty is small and does not directly impact throughput, only latency).
(For that matter, though, most modern FP/SIMD units have a direct approximate-reciprocal instruction that is single-cycle throughput or better and much more accurate--generally around 10-12 bits, so there's no need for this sort of thing. See, e.g. FRECPE on ARM NEON and [V]RCPP[S/D] on x86.)