Hacker News new | ask | show | jobs
by rasz 1481 days ago
both versions use SSE and are pipelined, the problem with the second one is data dependency, only two adds but the second one directly depends on the first ones result = stall
1 comments

SSE includes "scalar" adds (addsd), which are a 1x floating point instruction. These are "non-SIMD" instructions, serving as a replacement for the legacy x87 instructions.

There is also "parallel" adds (addpd).

Carefully look at the assembly language, the 1st version uses parallel adds (addpd) and parallel multiplies. The 2nd version uses scalar adds (addsd)

The other major point is that the 2nd version uses a singular move qword (64-bit) per loop iteration, while the 1st version is using the full 128-bit move per loop iteration.

---------

SSE is used for scalar double-precision these days, because scalar-SSE is faster than x87 instructions... and better matches the standards (x87 had "higher precision" than the IEEE specs, so it has different results compared to other computers. SSE is closer to the specs)