Hacker News new | ask | show | jobs
by zurn 4678 days ago
Spoiler: no. 1265 ms vs 439 ms on their OpenCV benchmark.

(Then they play some what-if games by underlocking the i3 in imaginative ways and applying SIMD opts to only the ARM side)

5 comments

We wanted to keep the clocks out of the picture. This I must admit, is a little unfair, because having the ability to run at higher clocks is indeed a capability of the CPU, which cannot be discounted. In this case I compared a Cortex-A15 Vs an Core i3. The A15s can run at max 2GHz I think. While the Core i3 cores can run upto 3GHz. However ARM has Cortex A57 coming along, which I believe will have similar performance to the A15 and will be able to run at higher clocks too. Of course we need to wait for the A57 to appear on an actual SOC and measure it, before we can truly make that claim.
i guess nobody really cares about clocks. what people care about is either maximum performance or max performance per watt (cpu and/or whole system). you should be benchmarking for those two scores.
This.

Assuming the compiler will generate good SSE code for the Intel CPU is a joke. If you write intrinsics for one arch, write it for both.

I'd bet money the Intel side could be made 2.5-3x faster with proper SSE intrinsics and maybe 5-6x faster with a Haswell i3 using AVX (SandyBridge doesn't have the cache bandwidth to fully utilise AVX properly).

The original OpenCV code already has intrinsics in many portions of the code. But enabling them results only in a 10% improvement.

We decided to report non-intrinsics version, because reporting the original OpenCV numbers with intrinsics as SSE optimized would be unfair to Intel. Apparently its not very well optimized.

My own guess is that if we add intrinsics for Intel to our own C code, it will boost by around 2x. We could have written a blog without reporting the Intel C optimized numbers, but that would have been unfair to Intel again.

Very primitive benchmark, not measuring power consumption in comparison between desktop i3 and ARM when the target is embedded use!

It would be informative having the power used to process the same tasks compared.

So what was the power consumption in every reported run? What was when the i3 was underclocked?

We dont report power because we dont have a way to measure power accurately.
Also we have made no effort to hide this information.

We have applied SIMD optimizations only to ARM because thats our business. Licensing computer vision algorithms on ARM.

The blog is a by-product of that effort.

but also, maybe? http://liliputing.com/2013/07/intel-atom-z3770-bay-trail-chi...

There's some evidence that the new Bay Trail Atom's should be pretty competitive against current ARM stuff.