|
|
|
|
|
by berkut
4687 days ago
|
|
This. Assuming the compiler will generate good SSE code for the Intel CPU is a joke. If you write intrinsics for one arch, write it for both. I'd bet money the Intel side could be made 2.5-3x faster with proper SSE intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge doesn't have the cache bandwidth to fully utilise AVX properly). |
|
We decided to report non-intrinsics version, because reporting the original OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel. Apparently its not very well optimized.
My own guess is that if we add intrinsics for Intel to our own C code, it will boost by around 2x. We could have written a blog without reporting the Intel C optimized numbers, but that would have been unfair to Intel again.