| ARM is RISC , Intel and AMD are CISC,
One important reason is their new pipelining facility. Apple M1 has 16 units that can pipeline their instructions. Meaning, they can reorder sequential instructions that aren't dependent on each other to run in parallel. That is not threads or anything, that can be and is being done in a single threaded program. AMD and Intel have 4 units for reordering tops, because their architecture is CISC and on instruction can be up to 15 bytes. M1 is RISC and instructions are just 4 byte fixed-length.
Thus architecturally it is easier to reorder instructions for RISC than CISC. CISC were better because of the specific instructions but now Apple has stuffed their CPU with specific hardware for alot of things including machine learning, graphic processor and encryption, instead of specific instructions, Apple has specific hardware, and can do with less instructions. And since they control hardware, software SDKs and OS they can actually get away with such radical changes. Intel and others can't, without a big change in industry. Source:
https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32... |
2) Reordering is different than pipelining, and CPUs have done both for decades. The difference between the M1 and Intel/AMD is that the M1 is wider in spots and can do much more extensive reordering. The M1 can decode and issue 8 instructions at a time. AMD can do 4 or 8 depending on whether the instruction is coming from memory or a special cache for pre-decoded instructions. The M1 has a reorder buffer of over 600 instructions—meaning it can have 600 instructions waiting for completion at a time (e.g. some executing while others are waiting for data to come back from memory). Intel and AMD’s reorder buffers are half the size.
3) Special instructions and controlling the software interface has little to do with performance on general purpose code.