|
|
|
|
|
by dzaima
676 days ago
|
|
Expanding on 3: I think it ends up at O(n^2 * log n) transistors, O(log n) critical path (not sure on routing or what fan-out issues might there be). Basically: determine end of instruction at each byte (trivial but expensive). Determine end of two instructions at each byte via end2[i]=end[end[i]]. Then end4[i]=end2[end2[i]], etc, log times. That's essentially log(n) shuffles. With 32-byte/cycle decode that's roughy five 'vpermb ymm's, which is rather expensive (though various forms of shortcuts should exist - for the larger layers direct chasing is probably feasible, and for the smaller ones some special-casing of single-byte instructions could work). And, actually, given the mention of O(log n)-transistor shuffles at http://www.numberworld.org/blogs/2024_8_7_zen5_avx512_teardo..., it might even just be O(n * log^2(n)) transistors. Importantly, x86 itself plays no part in the non-trivial part. It applies equqlly to the RISC-V compressed extension, just with a smaller constant. |
|
x86 requires checking for a REX, REX2, VEX, EVEX, etc prefix. Then you must check for either 1 or 2 instruction bytes. Then you must check for the existence of a register byte, how many immediate byte(s), and if you use a scaled index byte. Then if a register byte exists, you must check it for any displacement bytes to get your final instruction length total.
RISC-V starts with a small complexity then multiplies it by a small amount. x86 starts with a high complexity then multiplies it by a big amount. The real world difference here is large.
As I pointed out elsewhere ARM's A715 dropped support for aarch32 (which is still far easier to decode than x86) and cut decoder size by 75% while increasing raw decoder count by 20%. The decoder penalties of bad ISA design extend beyond finding instruction boundaries.