| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by userbinator 3929 days ago

The problem is that a simpler decoder doesn't compensate for the extra instruction cache needed to achieve the same hit rates/levels of performance, and that is bad for power efficiency since L1 cache needs to run at full core speed and in modern CPUs there's vastly more transistor area in the cache than the decoder. The increased memory traffic from lower hit rates also doesn't help. This article shows that effect quite clearly:

http://www.extremetech.com/extreme/188396-the-final-isa-show...

The x86s have 32K of L1 icache, the ARMs 32K or 16K, and the MIPS Loongson has 64K. Also, the Loongson does not support MIPS16 whereas the ARMs all support Thumb. If you look at the total energy consumed, the MIPS is noticeably worse than x86 or ARM:

http://www.extremetech.com/wp-content/uploads/2014/08/Averag...

In fact, the cache takes so much power that Intel engineers have found it profitable to turn off parts of the cache when in low-power modes; this feature is called Dynamic Cache Sizing and appears in the later Atom series.

4 comments

adwn 3929 days ago

> that is bad for power efficiency since L1 cache needs to run at full core speed and in modern CPUs there's vastly more transistor area in the cache than the decoder

It's not that simple. Dynamic power depends on the toggle rate of the flip-flops and the electrical capacitance of the fan-out wires and gates, not on the number of transistors. In a cache, very few storage elements change their state in every cycle, while the decoder performs a lot of work in every cycle.

link

hga 3929 days ago

Something I came across recently said that on x86, 65% of the power cost of a complete cache miss was in the logic of the cache hierarchy.

link

jensnockert 3929 days ago

It's even more complicated than that, since the cache doesn't have to cache encoded instructions, they can actually store decoded instructions, and a few of the caches on a modern x86 cpu actually does that, for example there's a loop cache after the decoders, so that small loops never have to be decoded more than once.

link

pcwalton 3929 days ago

> The problem is that a simpler decoder doesn't compensate for the extra instruction cache needed to achieve the same hit rates/levels of performance

Except this isn't true for x86-64, because x86-64 instructions are just as large as ARM instructions in practice.

link

rdc12 3928 days ago

And the MIPS is based on a 90nm process vs the 32nm of the Sandy Bridge they tested, while that is relevent to what you can buy, it says nothing about the intrinsic properties of the design.

Intel has had a massive advantege in fabrication for a long time.

link