| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snvzz 1360 days ago

>Variable length instruction coding for instance, which means a surprising amount of power is dedicated to circuitry which is just to find where the instruction boundaries are for speculative execution.

This does apply to x86 and m68k, as "variable" there means 1-16 byte, and dealing with that means bruteforcing decode at every possible starting point. Intel and AMD have both thus found 4-wide decode to be a practical limit.

It does not apply to RISC-V, where you get either 32bit or 2x 16bit. The added complexity of using the C extension is negligible, to the point where if a chip has any cache or rom in it, using C becomes a net benefit in area and power.

Therefore, ARMv8 AArch64 made a critical mistake in adopting a fixed 32bit opcode size. A mistake we can see in practice when looking at the L1 cache size that Apple M1 needed to compensate for poor code density.

L1 is never free. It is always *very* costly: Its size dictates area the cache takes, clocks the cache itself can achieve (which in turns caps the speed of the CPU), and power the cache draws.

1 comments

renox 1360 days ago

Maybe. If I remember well, Apple ARM M1 can decode up to 8 instruction at the same time, is-there any RISC-V CPU with the C extension which is able to decode 8 instructions?

snvzz 1360 days ago

>is-there any RISC-V CPU with the C extension which is able to decode 8 instructions?

Sure, there's Ascalon[0], 8-decode 10-issue, by Jim Keller's team at Tenstorrent. It isn't in the market yet, but is bound to be among the first RISC-V chips targeting very high performance.

Note that, at that size (8-decode implies lots of execution units, a relatively large design*), the negligible overhead of C extension is invisible. There's only gains to be had.

C extension decode overhead would only apply in the comically impractical scenario of a core that has neither L1 Cache nor any ROM in the chip. Otherwise, it is a net win.

And such a specialized chip would simply not implement C.

0. https://youtu.be/yHrdEcsr9V0?t=346

(*: 8-wide decode being small is just Jim Keller's idea of a joke)