Hacker News new | ask | show | jobs
by snvzz 815 days ago
>It is a lot harder with variable-length instrs where the length cannot even be calculated from the first byte - you need to read 10 bytes in the worst case to find an instr's len in x86. In aarch64 you need to read 0 bytes to know the length - it is 4

x86's approach to variable-length instructions is unfortunate.

In contrast, RISC-V leverages variable-length encoding to get the best code density among 64bit ISAs while sidestepping the instruction boundary problem.

(I digress, but note that while for the 32bit ISA RISC-V code density was competitive yet bested by ARM thumb2, it has since improved; RISC-V has the best density overall)

1 comments

The length of a RISC-V instruction is in the first byte though, not the tenth.

Note that RISC-V's code density with the C extension is in bytes, not in number of instructions. The core integer ISA was designed to be extensible from small embedded MCUs, so every other chip has to use it. High-performance RISC-V cores depend a lot on macro-op fusion to run as fast as 64-bit ARM.

>not in number of instructions.

This comes up very often, but is an unfounded concern. Not only is instruction count competitively low, but as it turns out, critical paths of inter-dependent instructions are, at worst (w/o fusion nor 2019+ extensions), no worse than aarch64[0].

>The core integer ISA was designed to be extensible from small embedded MCUs, so every other chip has to use it.

There's so much to unpack here. Firstly, the ISA, as documented in the specification itself[1], is described as "An ISA separated into a small base integer ISA, usable by itself as a base for customized accelerators or for educational purposes, and optional standard extensions, to support general-purpose software development." Note there's no reference to small embedded MCUs in there.

Furthermore, the spec elaborates "An ISA that avoids “over-architecting” for a particular microarchitecture style (e.g., mi- crocoded, in-order, decoupled, out-of-order) or implementation technology (e.g., full-custom ASIC, FPGA), but which allows efficient implementation in any of these.".

>High-performance RISC-V cores depend a lot on macro-op fusion to run as fast as 64-bit ARM.

First news. There seems to be some confusion here. 64-bit ARM (aarch64) is implemented in a range of microarchitectures, targeting different uses. I will go ahead and assume (for convenience) that you meant specifically very high performance implementations, as used in workstations and servers.

These tend to be superscalar and very wide (ARM M1 and Tenstorrent Ascalon are 8-wide). Their execution units tend to be simpler, and instead there's more of them and some can only do specific tasks. Typically, for these macro-op fuse-able instructions, an ARM microarchitecture will have to emit multiple micro-ops, whereas in RISC-V they already come as separate instructions.

0. https://dl.acm.org/doi/pdf/10.1145/3624062.3624233

1. https://riscv.org/technical/specifications/ (unprivileged architecture)