Hacker News new | ask | show | jobs
by shornlacuna 4531 days ago
Superior performance per watt.
3 comments

X86 chips (especially Intel's) are leagues ahead in terms of performance per watt.

In fact not only regarding performance per watt, but also performance per dollar. It's just that ARM designs for lowest power consumption while Intel/AMD design for maximum perfomance.

There are three metrics getting thrown around:

* Performance per dollar operating cost (performance per watt is closest to this)

* Performance per dollar capital expenditure (important for desktop systems, where operating costs are low)

* Performance per dollar TCO (sum of the above two)

The third one is the important one.

Good summary, but after reading this thread I'm still confused.

Why ARM?

How does the ISA impact the aforementioned criteria?

Why would a phones demand a different ISA?

The original choice of ARM for mobile and x86 for desktop is basically a historical accident.

The differences between modern ARM cpus and modern x86 have less to do with the ISA itself and more to do with the way ARM cpus have been designed to be low-power for decades and have worked their way up the performance scale, while x86 has been designed for performance and has only lately been emphasizing low power. These lead to different design points.

Because everything today is about the heat generated by computation. In a phone, it wastes the battery and is unpleasant for the user. In the datacentre, heat determines how much computation you can do in the volume of space you have, and how much you have to spend on cooling systems (the running of which is expensive too). So datacentre operators that already have a building are facing a choice: get a new building, or make better use of the one they have.

ARM cores are typically slower in absolute terms than Intel cores, but at a given level of power, you can run more of them.

What evidence is there that performance per watt is actually better on ARM when dealing with server processors?
Because there isn't any type of x86 processor that beats a comparable ARM processor for efficiency. If you could make an efficient x86 processor Atom would be it, and it's less efficient than ARM.

The x86 ISA fundamentally takes more silicon to implement than ARM. More gates = more power.

Everything Intel sells today clobbers any currently-marketed ARM chip on per-unit-energy computation performed. The race is not even close. ARM is only of interest if you are constrained by something other than compute (phones) or you don't know how to program and you are wasting most of the performance of your Xeons. The latter category contains nearly the entire enterprise software market and most other programmers as well.
Or, your program is entirely constrained by IO so most of the power of Xeon is wasted, while you still have to pay the premium for it.

This chip is interesting not because of the cpu core in it, but because it has two presumably fast 10GbE interfaces and possibility for a large amount of ram in a cheap-ish chip.

> More gates = more power.

This is not strictly true, the processor throughput also matters.

Total Power consumed = Power consumed by gates * Time taken to finish the job

There's another variable to throw into the mix: all gates are not created equal. A 28nm (this new processor) takes a lot more power than a 22nm (new intel processors) gate.
strictly speaking:

Total energy consumed = Power * Time

Do you have a source for any of this? x86 is much more powerful than ARM by watt, being exponentially faster at most math. I've never had anyone seriously propose that ARM is more efficient than x86 at anything then not pulling watts from a Li Ion battery.
Can you elaborate what you mean by "exponentially"?

For ARMv7 vs x86, yes, x86 just destroys ARMv7 (Cortex A15 etc.) in double (float64) performance.

While I do think x86 is still faster vs ARMv8, the gap is likely much less per GHz, because ARMv8 Neon now supports doubles much like SSE. Of course Haswell has wider AVX (256-bit) and ability to issue two 256-bit wide FMAs per cycle (16 float64 ops). Cortex A57 can handle just 1/4th of that, 4 FMA float64 ops per cycle.

That said, low to mid level servers are not really crunching much numbers. They're all about branchy code such as business logic, encoding / decoding, etc. Or waiting for I/O to complete.

So why would you care about math in a low end server CPU if it's not being used anyways?