|
|
|
|
|
by monocasa
967 days ago
|
|
Because RISC-V was designed to be trivial to decode length for, you simply need to look at the top two bits of each 16bit word to tell if it's a 32bit or 16bit instruction. At that point, spending the extra I$ budget isn't worth it. Those 20 'simple decoders' are literally just each one 2nand gate. Adding complexity to the I$ hasn't even made sense for x86 in two decades, because of the extra area needed for the I$ versus the extra decode logic. And that's a place where this extra decode is legitimately an extra pipeline stage. > I don't know how flexible their design is, it's quite possible adding an entire extra pipeline stage is a big deal. Much bigger than just rewriting the instruction decoders to 32bit RISC-V. I'm sure it is legitimately simpler for them. I'm not sure we should bend over backwards and bring down the rest of the industry because they don't want to do it. Veyron, Tenstorrent were showing off high perf designs with RV-C. |
|
For an 8-wide or 10-wide design, the propagation delays are getting too long to do it in all in single cycle. So you need the extra pipeline stage. The longer pipeline translates to more cycles wasted on branch mispredits.
RISC-V code is only about 6-14% denser than Aarch64 [1], I'm really not sure the extra complexity is worth it. Especially since Aarch64 still ends up with a lower instruction count, so it will be faster whenever you are decode limited instead of icache limited.
> Adding complexity to the I$ hasn't even made sense for x86 in two decades
Hang on. Limiting the Icache to only 32bit aligned access actually simplifies it.
And since the NUVIA core was originally an aarch64 core, why wouldn't they optimise for hardcoded 32bit alignment and get a slightly smaller Icache?
[1] https://www.bitsnbites.eu/cisc-vs-risc-code-density/