| Yeah, that logic seems to all work out. I found annotated die shots of Zen 3 and Zen 4 that pretty much confirm the op cache: https://locuza.substack.com/p/zen-evolution-a-small-overview Pretty strong evidence that AMD are using a much simpler encoding scheme with roughly 64bits per uop. Also, That uop cache on Zen 4 is starting to look ridiculously large. But that does give us a good idea how big the microcode rom is. If we go back to the previous intel die shot with its combined microcode rom + uop cache, it appears intel's uop cache is actually quite small thanks to their better encoding. > Furthermore I assume that one x86 instruction translates to more than one uOP instruction on average (e.g. instructions involving memory operands are cracked I suspect it's not massively higher one uop per instruction. Remember, the uop cache is in the fused-uop domain (so before memory cracking) and instruction fusion can actually squash some instructions pairs into a single uop. The bigger hinderance will be any rules that prevent every uop slot from being filled. Intel appears to have many such rules (at least for Sandybridge/Haswell/Skylake) > and blow lots of energy on the x86 translation front end TBH, we have no idea how big the x86 tax is. We can't just assume the difference in performance per watt between the average x86 design and average high performance aarch64 design is entirely caused by the x86 tax. Intel and AMD simply aren't incentivised to optimise their designs for low power consumption as their cores simply aren't used in mobile phones where ultra low power consumption is absolutely critical. |
Ooo thanks! Sure looks like strong evidence.
> TBH, we have no idea how big the x86 tax is.
No, and it gets even more uncertain when you consider different design targets. E.g. a 1000W Threadripper targets a completely different segment than a 10W ARM Cortex.Would an ARM chip designed to run at 1000W beat the Threadripper? Who knows?
> Intel and AMD simply aren't incentivised to optimise their designs for low power consumption as their cores simply aren't used in mobile phones where ultra low power consumption is absolutely critical.
They'll keep doing their thing until they can't compete. They lost mobile and embedded, and competitors are eating into laptops and servers where x86 continues to have a stronghold. But perf/watt matters in all segments these days, and binary compatibility is dropping in importance (e.g. compared to 20-40 years ago), much thanks to open source.
IMO the writing is on the wall, but it will take time (especially for the very slow server market).