| >whereas with a variable length instruction stream you need all sorts of interconnects between the decode units, and these interconnects add significant complexity and latency. I find worth noting this is not always the case. e.g. RISC-V C extension provides variable length instructions, but they're still either 16 or 32 bit. Special care has been put into making the decoding overhead of dealing with this situation negligible, and it is indeed so. There's benefit, transistor-budget-wise, the moment there's any on-die cache or on-die rom. Any chip that's smaller than that is going to be very specialized and can simply omit C. In any chip that's larger, C is a net benefit. As a practical example, the RISC-V based Ascalon by Jim Keller's team is a 8-wide (like M1), 10-issue CPU. However, you're absolutely right the wild sort of variable instruction length that is seen in CISC architectures like x86 is a huge issue that massively complicates implementations and outright imposes a practical limit in decoder width. OTOH in aarch64, the adoption of a fixed instruction size, thus tanking code density, was unenlightened to the point of brain-dead, we see the cache sizes M1/M2 need just to deal with this, and I'm afraid ARM will be gone for other reasons (non-technical, to do with mismanagement) before they have a chance to correct course and re-introduce compressed instructions. As for the rest of the article, I generally agree with you that it presents outright wrong information as facts and then tries to push the wrong conclusion. It is utter bull, practically nothing of value can be found in there. I'm not even surprised, as it is pretty much the norm in RISC opposition. |
It's more than that. In RISC-V, you only need the first two bits of each instruction to determine whether it's a 16 bit or 32 bit instruction; you don't need to decode an instruction to know its length.
> [...] we see the cache sizes M1/M2 need just to deal with this, [...]
Do the M1/M2 need these cache sizes, or do they have these cache sizes because they can have these cache sizes, due to having a 4x larger page size by default? (Normally, page size wouldn't be that much of a problem for instruction caches, but for x86 it is because the x86 ISAs don't require explicit instruction cache invalidation on self-modifying code; x86 processors would likely have larger L1 instruction cache sizes if they could get away with it.)