Hacker News new | ask | show | jobs
by berkut 4687 days ago
This.

Assuming the compiler will generate good SSE code for the Intel CPU is a joke. If you write intrinsics for one arch, write it for both.

I'd bet money the Intel side could be made 2.5-3x faster with proper SSE intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge doesn't have the cache bandwidth to fully utilise AVX properly).

1 comments

The original OpenCV code already has intrinsics in many portions of the code. But enabling them results only in a 10% improvement.

We decided to report non-intrinsics version, because reporting the original OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel. Apparently its not very well optimized.

My own guess is that if we add intrinsics for Intel to our own C code, it will boost by around 2x. We could have written a blog without reporting the Intel C optimized numbers, but that would have been unfair to Intel again.