| "Performance is agnostic of ISA" is too strong a statement. The variable length instruction encoding is a significant performance disadvantage, as is the strict memory ordering requirement of X86/X64. X64 decoders are indeed only ~5% of the die on a modern CPU, but it's 5% that is always at 100% utilization. That's a non-trivial amount of extra power. X64 decode parallelism is also limited. I've heard four instructions at once as a magic number beyond which it becomes really hard. This is why hyperthreading (SMT) is so common on X64 chips. It's a "cheat" to keep the pipeline full by decoding two different streams in parallel (allowing 8X parallelism). SMT isn't free though. It drags in a lot of complexity at the register file, pipeline, and scheduler levels, and is a bit of a security minefield due to spectre-style attacks. All that complexity adds more overhead and therefore more power consumption as well as taking up die space that could be used for more cores, wider cores, more cache, etc. ARM is just a lot easier to optimize and crank up performance than X86. The M1 apparently has 8X wide instruction decode, and with fixed length instructions it would be trivial to take it to 16X or 32X if there was benefit to that. I could definitely imagine something like a 16X wide ARM64 core at 3nm capable of achieving up to 16X instruction level parallelism as well as supporting really wide vector operations at really high throughput. Put like 16 of those on a die and we're really far beyond X64 performance in every category. This is also why SMT/hyperthreading doesn't really exist in the ARM world. There's less to be gained from it. Better to have a simpler core and more of them. IMHO X86/X64 has hit a performance wall at least in terms of power/performance, and this time it might be insurmountable due to variable length instructions and associated overhead. It matters in the data center as well as for mobile and laptops. There's a reason AWS is pricing to steer people toward Graviton: it costs less to run. Power is the largest component of most data center costs. |