We wanted to keep the clocks out of the picture. This I must admit, is a little unfair, because having the ability to run at higher clocks is indeed a capability of the CPU, which cannot be discounted. In this case I compared a Cortex-A15 Vs an Core i3. The A15s can run at max 2GHz I think. While the Core i3 cores can run upto 3GHz. However ARM has Cortex A57 coming along, which I believe will have similar performance to the A15 and will be able to run at higher clocks too. Of course we need to wait for the A57 to appear on an actual SOC and measure it, before we can truly make that claim.
i guess nobody really cares about clocks. what people care about is either maximum performance or max performance per watt (cpu and/or whole system). you should be benchmarking for those two scores.
Assuming the compiler will generate good SSE code for the Intel CPU is a joke. If you write intrinsics for one arch, write it for both.
I'd bet money the Intel side could be made 2.5-3x faster with proper SSE intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge doesn't have the cache bandwidth to fully utilise AVX properly).
The original OpenCV code already has intrinsics in many portions of the code. But enabling them results only in a 10% improvement.
We decided to report non-intrinsics version, because reporting the original OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel. Apparently its not very well optimized.
My own guess is that if we add intrinsics for Intel to our own C code, it will boost by around 2x.
We could have written a blog without reporting the Intel C optimized numbers, but that would have been unfair to Intel again.