Hacker News new | ask | show | jobs
by sakras 1261 days ago
I’m not convinced that’ll bring about any significant change. Any power savings from switching to a RISC from x86 is coming from simplifying the instruction decoder, which seems to be about 15-20% if we compare the Ampere Altra to a comparable AMD chip. That’s not an order of magnitude.

On the other hand, on the order of 80% of a chip’s power is spent on OOO execution. If you want the order of magnitude improvement in power efficiency, you need to dump superscalar/OOO in favor of smart compilers and VLIW. Cheap DSPs have been doing it for years, but compilers aren’t good enough yet for general purpose processing.

3 comments

Agree that OoO is the big cost. But we can also mitigate that without VLIW: SIMD/vector reduces the instruction count by ~5x, and energy by a similar factor.

And a portable API such as Highway also helps us move the same code from x86 to Arm or RISC-V with just a recompile :D

It's not clear that even a super smart compiler can do this. The best schedule depends on the latency of instructions. This is a problem because we can't know statically whether a particular memory load is in L1/L2/L3/DRAM/etc., as this can vary for different executions of the same load instruction.
According to [1], 88% of the speedup given by OOO processors is due to speculation, and the reordering in the case of cache misses attributed around 10% of the speedup. If OOO in general gives around a 50% speedup compared to in-order designs, reordering in the face of cache misses gives only around a 5% speedup. If you use a good static schedule with speculation, you'll get the bulk of the speedup. The rest can be recovered by increasing the clock rate by 5%, since you'll have gotten rid of so much silicon.

[1] https://doi.org/10.1145/2451116.2451143

Nice paper!
> compilers aren’t good enough yet

So we need to go back to coding in assembler to save the planet? Sign me up!