| > I'm a huge fan of aarch64, it's a very well designed ISA I totally agree. I would go as far as to say that it's the best "general purpose" ISA available today. I am under the impression that the design was heavily data driven, building on decades of industry experience in many different sectors and actually providing efficient instructions for the operations that are used the most in real code. > I only really defend x86 because nobody else does :-D I can easily identify with that position. > I get the impression that Intel are planning to eventually abandon their P core arch Very interesting observation. It makes a lot of sense. I also think that we will see more hybrid solutions. Looking at the Samsung Exynos 2200, for an example from the low-power segment, it's obvious that the trend is towards heterogeneous core configurations (1 Cortex-X2 + 3 Cortex-A710 + 4 Cortex-A510): https://locuza.substack.com/p/die-analysis-samsung-exynos-22... Heterogeneous core configurations has only just recently made it to x86, and I think it can extend the lifetime of x86. For laptops, I can see an x86 solution where you have a bunch of very simple and power-efficient cores in the bottom, that perhaps even uses something like software-aided decoding (which appears to be more power-efficient than pure hardware decoding) and/or loop buffers (to power down the front end most of the time). And then build on top of that with a few "good" E-cores, and only one or two really fast cores for single-threaded apps. For servers I think that having many good E-cores would be a better fit. Kind of similar to the direction AMD is taking with their Bergamo EPYC parts (though technically Bergamo is not an E-core, it gives more cores at the same TDP). |
Yeah, that's the impression I get too. I also get the impression they were planning ahead for the very wide GBOoO designs (I think Apple had quite a bit of influence on the design and they were already working a very wide GBOoO microarch), so there is a bias towards a very dense fixed-width encoding, at the expense of increased decoding complexity.
ARM weren't even targeting the ultra low end, as they have a completely different -M ISA for that.
This is in contrast to RISC-V. Not only do they target the entire range from ultra low end to high performance, but the resulting ISA feels like it has a biased towards ultra low gate count designs (the way immediates are encoded are points towards this).
---------------
You might hate me for this, but I have to raise the question:
Does AArch64 actually count as a RISC ISA?
It might have the fixed width encoding and load/store arch we typically associate with RISC ISAs. But there is one major difference that arguably disqualifies it on a technicality.
All the historic RISC ISAs were designed in parallel with the first generation microarchitecture of the first CPU and where hyper-optimised for that microarchitecture (often to a fault, leaving limited room for expansion and introducing "mistakes" like branch delay slots). Such ISAs were usually very simple to decode, which lead to the famous problems that RISC had with code density.
I lean towards the opinion that this tight coupling between ISA and RISC microarchitecture is another fundamental aspect of a RISC ISA.
But AAarch64 was apparently designed by committee, independent of any single microarchitecture. And they apparently had a strong focus on code density.
The result is something that is notably different from any other RISC ISA.
You could make a similar argument about RISC-V, it was also designed by committee, independent of any single microarchitecture. But they also did so with an explicit intention make a RISC ISA, and the end result feels very RISC to me.
> that perhaps even uses something like software-aided decoding (which appears to be more power-efficient than pure hardware decoding)
At this point, I have very little hope for software-aided decoding. Transmeta tried it, Intel kind of tried it with software x86 emulation on Itanium, Nvidia bought the Transmeta patents and tried it again with Denver.
None of these attempts worked well, so I kind of have to conclude it's a flawed idea.
Though the flaw was probably the statically scheduled VLIW arch they were translating into. Maybe if you limited your software decoding to just taking off the rough edges of x86 instruction encoding it could be a net win.