Hacker News new | ask | show | jobs
by kr7 3399 days ago
The solution is to compile with SSE2 on x86. (flags: -mfpmath=sse -msse -msse2)

On x86-64, the compiler should default to SSE2.

SSE2 is ~16 years old so compatibility shouldn't be an issue.

1 comments

Technically, you only actually need the instructions from the original SSE set to do floating point operations. SSE2 adds a bunch of really useful integer floating point instructions.

But the only extra cpus that gets you is the Pentium III, AMD Athlon XP, and AMD Duron.

SSE2 is supported on every single x86 cpu released after those, such as the Pentium 4, Pentium M, and Athlon 64.

It's a real shame that people are still using CPUs that don't support SSE4, such as the AMD Phenom and Phenom II cpus, otherwise everyone would have moved to exclusive SSE4.

SSE1 is single-precision only. SSE2 added double precision.

So the bug will still appear for 'double' using just SSE1.

Some Atoms only support up to SSSE3 too.