|
|
|
|
|
by titzer
2023 days ago
|
|
I think the M1 chip finally proves the inherent design superiority of RISC over CISC. For years, Intel stayed ahead of all other competitors by having the best process, clockspeeds, and the most advanced out-of-order execution. By internally decoding CISC to RISC, Intel could feed a large number of execution ports to extract maximum ILP. They had to spend gobs of silicon for that: complex decoding, made worse by the legacy of x86's encodings, complex branch prediction, and all that OOE takes a lot of real estate. They could do that because they were ahead of everyone else in transistor count. But in the end all of that went bye bye when Intel lost the process edge and therefore lost the transistor count advantage. Now with the 5nm process others can field gobs of transistors and they don't have the x86 frontend millstone around their necks. So ARM64 unlocked a lot of frontend bandwidth to feed even more execution ports. And with the transistor budget so high, 8 massive cores could be put on die. Now, people have argued for decades that the instruction density of CISC is a major advantage, because that density would make better use of I-cache and bandwidth. But it looks like decode bandwidth is the thing. That, and RISC usually requires aligned instructions, which means that branch density cannot be too high, and branch prediction data structures are simpler and more effective. (Intel still has weird slowdowns if you have too many branches in a cache line). It seems frontend effects are real. |
|
A bet doing 8-wide x86 decoding would be tough, but once you've got a micro-up cache, it's doable so long as you have a cache hit. Zen 3 is 8-wide the 95% of the time you hit the micro-up cache.
The real question is how does Apple keep that thing fed? An 8-wide decoder is pointless if most of the time you've got 6 empty pipelines: https://open.hpi.de/courses/parprog2014/items/aybclrPgY4nPyY... (discussing ILP wall). M1 outperforming Zen 3 by 20% on the SPEC GCC benchmark, at 1/3 lower clocks-speed. That's 80% more ILP than an Zen 3, which is itself a large advance in ILP.