Hacker News new | ask | show | jobs
by snvzz 1337 days ago
>whereas with a variable length instruction stream you need all sorts of interconnects between the decode units, and these interconnects add significant complexity and latency.

I find worth noting this is not always the case.

e.g. RISC-V C extension provides variable length instructions, but they're still either 16 or 32 bit.

Special care has been put into making the decoding overhead of dealing with this situation negligible, and it is indeed so. There's benefit, transistor-budget-wise, the moment there's any on-die cache or on-die rom. Any chip that's smaller than that is going to be very specialized and can simply omit C. In any chip that's larger, C is a net benefit.

As a practical example, the RISC-V based Ascalon by Jim Keller's team is a 8-wide (like M1), 10-issue CPU.

However, you're absolutely right the wild sort of variable instruction length that is seen in CISC architectures like x86 is a huge issue that massively complicates implementations and outright imposes a practical limit in decoder width.

OTOH in aarch64, the adoption of a fixed instruction size, thus tanking code density, was unenlightened to the point of brain-dead, we see the cache sizes M1/M2 need just to deal with this, and I'm afraid ARM will be gone for other reasons (non-technical, to do with mismanagement) before they have a chance to correct course and re-introduce compressed instructions.

As for the rest of the article, I generally agree with you that it presents outright wrong information as facts and then tries to push the wrong conclusion. It is utter bull, practically nothing of value can be found in there. I'm not even surprised, as it is pretty much the norm in RISC opposition.

1 comments

> e.g. RISC-V C extension provides variable length instructions, but they're still either 16 or 32 bit.

It's more than that. In RISC-V, you only need the first two bits of each instruction to determine whether it's a 16 bit or 32 bit instruction; you don't need to decode an instruction to know its length.

> [...] we see the cache sizes M1/M2 need just to deal with this, [...]

Do the M1/M2 need these cache sizes, or do they have these cache sizes because they can have these cache sizes, due to having a 4x larger page size by default? (Normally, page size wouldn't be that much of a problem for instruction caches, but for x86 it is because the x86 ISAs don't require explicit instruction cache invalidation on self-modifying code; x86 processors would likely have larger L1 instruction cache sizes if they could get away with it.)

> In RISC-V, you only need the first two bits of each instruction to determine whether it's a 16 bit or 32 bit instruction

Isn't it one bit in the beginning(?) of each 16-bit instruction? So a 32-bit instruction has this information duplicated in the same place in the latter 16-bit half, since a decoder has to be able to decide whether it's trying to decode a 16-bit instruction or whether it's in the middle of a 32-bit instruction.

The above assuming that the common strategy for implementing a parallel decoder for RVC is to start decoding at each 16-bit offset, and then throw away those cases where it turns out that it was in the middle of a 32-bit instruction, and that RVC has been designed with this implementation strategy in mind.

> Isn't it one bit in the beginning(?) of each 16-bit instruction?

No, it's the first two bits of every instruction (RISC-V is little-endian, so these are the least-significant bits). Two bits have four possible values, three of them are for 16-bit instructions, one of them is for 32-bit instructions.

> So a 32-bit instruction has this information duplicated in the same place in the latter 16-bit half, since a decoder has to be able to decide whether it's trying to decode a 16-bit instruction or whether it's in the middle of a 32-bit instruction.

No, that information is not duplicated. The decoder cannot know whether it's in the middle of a 32-bit instruction or not; it has to decode the length of all preceding instructions. That's why it's important that you can know the instruction length without decoding the instruction, so that simple logic can tell decoders other than the first whether they're in the start or in the middle of an instruction.

Huh, that's surprising. I looked it up and indeed you're correct. Well, oof. Though to be fair I don't now how much of an impediment that is for actually implementing very wide decoders in practice. Hopefully not too bad.
Relative to x86 (where you need bruteforce, with complexity growing geometric to decode width), it is so cheap you could call it free.