|
|
|
|
|
by peterfirefly
881 days ago
|
|
> 1. Decoding width - There is a practical limit to how many instructions you can decode in parallell. You can add pipeline steps, but at some point it becomes absurd. Branches. Branches also make really wide decodes useless. The cost/benefit is towards wider decoders for A64 than for AMD64. The average A64 instruction does slightly less work than the average AMD64 instruction so the net result is that it makes sense to have slightly wider decoders (in terms of "work") for A64 than for AMD64. X86 CPUs don't quite use a "RISC-like encoding". The µops support RMW for memory, for example. The encoding is of course very much regularized, but I don't think the RISC people have a patent on that. Translation to an internal format is common for high-performance RISC CPUs as well. The Power CPUs call it "cracking" when complicated instructions are split into simpler µops. |
|
I'm not sure that the difference is that big. A64 actually has quite powerful instructions, and some of them do more work than similar x86 instructions (madd and ubfx come to mind). In my testing A64 code often has fewer instructions than x86: https://www.bitsnbites.eu/cisc-vs-risc-code-density/
> X86 CPUs don't quite use a "RISC-like encoding". The µops support RMW for memory, for example.
I would love to learn more about that. Do you have any references? I was under the impression that internal instructions followed the load/store principle since I assume that the internal pipeline is a load/store pipeline?
> The Power CPUs call it "cracking" when complicated instructions are split into simpler µops.
Yes, it's the IBM term AFAIK. They call it cracking in zArch too. I also suapect that at least some ARMv8/9 implementations do cracking too (many AArch64 instructions have multiple results, which might be better handled as multiple internal instructions - I think it's partly a code density thing).