| Yeah, I agree that the writing is on the wall for x86. As you said, power consumption does matter for laptops and server farms. I'm a huge fan of aarch64, it's a very well designed ISA. I bought a Mac so I could get my hands of a high-preformance aarch64 core. I love the battery life and I'm not going back. I only really defend x86 because nobody else does, and then people dog pile on it talking it down and misrepresenting what a modern x86 pipeline is. Though I wouldn't write x86 off yet. I get the impression that Intel are planning to eventually abandon their P core arch (the one with direct lineage all the way back to the original Pentium Pro). They haven't being doing much innovation on it. Intel's innovation is actually focused on their E core arch, which started as the Intel Atom and wasn't even out-of-order. It's slowly evolved over the years with a continued emphasis on low-power consumption until it's actually pretty completive with the P core arch. If you compare Golden Cove and Gracemont, the frontend is radically different. Golden Cove has a stupid 6 wide decoder that can deliver 8 uops per cycle... though it's apparently sitting idle 80% of the time (power gated) thanks to the 4K uop cache. Gracemont doesn't have a uop cache. Instead it uses the space for a much larger instruction cache and two instruction encoders running in parallel, each 3-wide. It's a much more efficient way to get 6-wide instruction decoding bandwidth, I assume they are tagging decode boundaries in the instruction cache. Gracemont is also stupidly wide. Golden cove only has 12 execution unit ports, Gracemont has 17. It's a bit narrow in other places (only 5 uops per cycle between the front-end and backend) but give it a few more generations and a slight shift in focus and it could easily outperform the P core. Perhaps add a simple loop-stream buffer and upgrade to three or four of those 3-wide decoders running in parallel. Such a design would have a significantly lower x86 tax. Low enough to save them in the laptop and server farm market? I have no idea. I'm just not writing them off. |
I totally agree. I would go as far as to say that it's the best "general purpose" ISA available today. I am under the impression that the design was heavily data driven, building on decades of industry experience in many different sectors and actually providing efficient instructions for the operations that are used the most in real code.
> I only really defend x86 because nobody else does
:-D I can easily identify with that position.
> I get the impression that Intel are planning to eventually abandon their P core arch
Very interesting observation. It makes a lot of sense.
I also think that we will see more hybrid solutions. Looking at the Samsung Exynos 2200, for an example from the low-power segment, it's obvious that the trend is towards heterogeneous core configurations (1 Cortex-X2 + 3 Cortex-A710 + 4 Cortex-A510): https://locuza.substack.com/p/die-analysis-samsung-exynos-22...
Heterogeneous core configurations has only just recently made it to x86, and I think it can extend the lifetime of x86.
For laptops, I can see an x86 solution where you have a bunch of very simple and power-efficient cores in the bottom, that perhaps even uses something like software-aided decoding (which appears to be more power-efficient than pure hardware decoding) and/or loop buffers (to power down the front end most of the time). And then build on top of that with a few "good" E-cores, and only one or two really fast cores for single-threaded apps.
For servers I think that having many good E-cores would be a better fit. Kind of similar to the direction AMD is taking with their Bergamo EPYC parts (though technically Bergamo is not an E-core, it gives more cores at the same TDP).