Hacker News new | ask | show | jobs
by jcranmer 789 days ago
Um... no. This is 100% completely and totally wrong.

x86-64 requires the hardware to support SSE2, which has native single-precision and double-precision instructions for floating-point (e.g., scalar multiply is MULSS and MULSD, respectively). Both the single precision and the double precision instructions will take the same time, except for DIVSS/DIVSD, where the 32-bit float version is slightly faster (about 2 cycles latency faster, and reciprocal throughput of 3 versus 5 per Agner's tables).

You might be thinking of x87 floating-point units, where all arithmetic is done internally using 80-bit floating-point types. But all x86 chips in like the last 20 years have had SSE units--which are faster anyways. Even in the days when it was the major floating-point units, it wasn't any slower, since all floating-point operations took the same time independent of format. It might be slower if you insisted that code compilation strictly follow IEEE 754 rules, but the solution everybody did was to not do that and that's why things like Java's strictfp or C's FLT_EVAL_METHOD were born. Even in that case, however, 32-bit floats would likely be faster than 64-bit for the simple fact that 32-bit floats can safely be emulated in 80-bit without fear of double rounding but 64-bit floats cannot.

1 comments

I agree with you. It should take the same time when thinking more about it. I remember learning this in ~2016 and I did performance test on Skylake which confirmed (Windows VS2015). I think I remember that i only tested with addsd/addss. Definitely not x87. But as always, if the result can not be reproduced... I stand corrected until then.
I tried to reproduce it on Ivybridge (Windows VS20122) and failed (mulss and muldd) [0]. single and double precision takes the same time. I also found a behavior where the first batch of iterations takes more time regardless of precision. It is possible that this tricked me last time.

[0] https://gist.github.com/dosshell/495680f0f768ae84a106eb054f2...

Sorry for the confusion and spreading false information.