| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by phire 975 days ago

It doesn't matter how optimised the length decoding is. Not doing it is still faster.

For an 8-wide or 10-wide design, the propagation delays are getting too long to do it in all in single cycle. So you need the extra pipeline stage. The longer pipeline translates to more cycles wasted on branch mispredits.

RISC-V code is only about 6-14% denser than Aarch64 [1], I'm really not sure the extra complexity is worth it. Especially since Aarch64 still ends up with a lower instruction count, so it will be faster whenever you are decode limited instead of icache limited.

> Adding complexity to the I$ hasn't even made sense for x86 in two decades

Hang on. Limiting the Icache to only 32bit aligned access actually simplifies it.

And since the NUVIA core was originally an aarch64 core, why wouldn't they optimise for hardcoded 32bit alignment and get a slightly smaller Icache?

[1] https://www.bitsnbites.eu/cisc-vs-risc-code-density/

2 comments

monocasa 975 days ago

> Hang on. Limiting the Icache to only 32bit aligned access actually simplifies it.

Even x86 only reads 16 or 32 byte aligned fields out of the I$, then shifts them. There's not extra I$ complexity. You still have to do that shift at some point, in case you don't jump 32 byte aligned address. You also ideally don't want to only hit peak decode bandwidth starting on aligned 32 byte program counters, so that whole shift register thing is pretty much a requirement. And that's where most of the propagation delays are.

> RISC-V code is only about 6-14% denser than Aarch64 [1], I'm really not sure the extra complexity is worth it. Especially since Aarch64 still ends up with a lower instruction count, so it will be faster whenever you are decode limited instead of icache limited.

There's heavy use of fusion, and fwiw, the M1 also heavily fuses into micro ops too (and I'm sure the AArch64 morph of NUVIA's cores do too).

link

Symmetry 975 days ago

Under a classic RISC architectures you can't jump to non-aligned addresses. That lets you specify jumps that are 4 times longer for the same number of bits in your jump instruction. Here's MIPS as an example:

https://en.wikibooks.org/wiki/MIPS_Assembly/Instruction_Form...

link

monocasa 975 days ago

Classic RISC was targeting about 20k gates and isn't really applicable here.

link

Symmetry 975 days ago

AArch64 does the same thing.

https://valsamaras.medium.com/arm-64-assembly-series-branch-...

And it's not only a way of decreasing code size. It help with security too. If you can have an innocuous looking bit of binary starting at address X that turns into a piece of malware if you dump to instruction X+1 that's a serious problem.

https://mainisusuallyafunction.blogspot.com/2012/11/attackin...

RISC-V, I'm pretty sure, enforces 16 bit alignment and is self synchronizing so it doesn't suffer from this despite being variable length. But if it allowed the PC to be pointed at an instruction with a 1 byte offset then it might be.

As far as I'm aware every RISC ISA that's had any commercial succss does this. HP RISC, SPARC, POWER, MIPS, Arm, RISC-V, etc.

link

monocasa 975 days ago

> And it's not only a way of decreasing code size. And RISC-V has better code density than AArch64.

> It help with security too. If you can have an innocuous looking bit of binary starting at address X that turns into a piece of malware if you dump to instruction X+1 that's a serious problem.

JIT spraying attacks work just fine on aligned architectures too, hence why Linux hardened the AArch64 BPF JIT as well: https://linux-kernel.vger.kernel.narkive.com/M0Qk08uz/patch-...

Additionally, MIPS these days has a compressed extension to their ISA too, heavily inspired by RV-C. https://mips.com/products/architectures/nanomips/

link

Symmetry 975 days ago

Not all JIT spraying relies on byte offsets to get past JIT filters, the attack I gave is just an example.

And NanoMips requires instructions to be word aligned just like everybody else, it's just that it requires 16 bit alignment rather than 32. Attempting to access an odd PC address will result in an access error according to this:

https://s3-eu-west-1.amazonaws.com/downloads-mips/I7200/I720...

link

hajile 975 days ago

Back in 2019, RISC-V was 15-20% smaller than x86 (up to 85% smaller in some cases) and was 20-30% smaller than ARM64 (up to 50% smaller in some cases).

https://project-archive.inf.ed.ac.uk/ug4/20191424/ug4_proj.p...

Since then, RISC-V has added a bunch more instructions that ARM/x86 already had which has made RISC-V even smaller relative to them.

link