Hacker News new | ask | show | jobs
by Fazel94 2020 days ago
ARM is RISC , Intel and AMD are CISC, One important reason is their new pipelining facility.

Apple M1 has 16 units that can pipeline their instructions.

Meaning, they can reorder sequential instructions that aren't dependent on each other to run in parallel. That is not threads or anything, that can be and is being done in a single threaded program.

AMD and Intel have 4 units for reordering tops, because their architecture is CISC and on instruction can be up to 15 bytes. M1 is RISC and instructions are just 4 byte fixed-length. Thus architecturally it is easier to reorder instructions for RISC than CISC.

CISC were better because of the specific instructions but now Apple has stuffed their CPU with specific hardware for alot of things including machine learning, graphic processor and encryption, instead of specific instructions, Apple has specific hardware, and can do with less instructions.

And since they control hardware, software SDKs and OS they can actually get away with such radical changes. Intel and others can't, without a big change in industry.

Source: https://debugger.medium.com/why-is-apples-m1-chip-so-fast-32...

1 comments

1) There’s not a meaningful difference between RISC and CISC on modern architectures. CISC has certain advantages these days because they are compact at encoding memory operations. Intel and AMD crack instructions into operations called micro-ops. There is no meaningful difference between how easy it is to reorder the micro-ops versus RISC ops. CISC or RISC, the internal structures of the processor operate on something quite different than the instruction the instruction as written in memory. (A decoded form.)

2) Reordering is different than pipelining, and CPUs have done both for decades. The difference between the M1 and Intel/AMD is that the M1 is wider in spots and can do much more extensive reordering. The M1 can decode and issue 8 instructions at a time. AMD can do 4 or 8 depending on whether the instruction is coming from memory or a special cache for pre-decoded instructions. The M1 has a reorder buffer of over 600 instructions—meaning it can have 600 instructions waiting for completion at a time (e.g. some executing while others are waiting for data to come back from memory). Intel and AMD’s reorder buffers are half the size.

3) Special instructions and controlling the software interface has little to do with performance on general purpose code.

I disagree, here's why:

CISC instruction in are still variable length. People can argue that micro-ops are RISC like, but micro-code is an implementation detail very close to hardware.

One of the key ideas of RISC was to push a lot of heavy lifting over to the compiler. That is still the case. Micro-ops cannot be re-arranged by the compiler for optimal execution.

Time is more critical when running micro-ops than when compiling. It is an obvious advantage in making it possible for advance compiler to rearrange code rather than relying on precious silicon to do it.

While RISC processors have gotten more specialized instructions over the years, e.g. for vector processing. They still lack the complexity of memory access modes that many CISC instructions have.