|
Some notes: 3: I don't think more decoders should be exponentially more complex, or even polynomial; I think O(n log n) should suffice. It just has a hilarious constant factor due to the lookup tables and logic needed, and that log factor also impacts the critical path length, i.e. pipeline length, i.e. mispredict penalty. Of note is that x86's variable-length instructions aren't even particularly good at code size. Golden Cove (~1y after M1) has 6-wide decode, which is probably reasonably near M1's 8-wide given x86's complex instructions (mainly free single-use loads). [EDIT: actually, no, chipsandcheese's diagram shows it only moving 6 micro-ops per cycle to reorder buffer, even out of the micro-op cache. Despite having 8/cycle retire. Weird.] 6: The count of extensions is a very bad way to measure things; RISC-V will beat everything in that in no time, if not already. The main things that matter are ≤SSE4.2 (uses same instruction encoding as scalar code); AVX1/2 (VEX prefix); and AVX-512 (EVEX). The actual instruction opcodes are shared across those. But three encoding modes (plus the three different lengths of the legacy encoding) is still bad (and APX adds another two onto this) and the SSE-to-AVX transition thing is sad. RISC-V already has two completely separate solutions for SIMD - v (aka RVV, i.e. the interesting scalable one) and p (a simpler thing that works in GPRs; largely not being worked on but there's still some activity). And if one wants to count extensions, there are already a dozen for RVV (never mind its embedded subsets) - Zvfh, Zvfhmin, Zvfbfwma, Zvfbfmin, Zvbb, Zvkb, Zvbc, Zvkg, Zvkned, Zvknhb, Zvknha, Zvksed, Zvksh; though, granted, those work better together than, say, SSE and AVX (but on x86 there's no reason to mix them anyway). And RVV might get multiple instruction encoding forms too - the current 32-bit one is forced into allowing using only one register for masking due to lack of encoding space, and a potential 48-bit and/or 64-bit instruction encoding extension has been discussed quite a bit. 8: RISC-V RVV can be pretty problematic for some things if compiling without a specific target architecture, as the scalability means that different implementations can have good reason to have wildly different relative instruction performance (perhaps most significant being in-register gather (aka shuffle) vs arithmetic vs indexed load from memory). |
6. Not all of those SIMD sets are compatible with each other. Some (eg, SSE4a) wound up casualties of the Intel v AMD war. It's so bad that the Intel AVX10 proposal is mostly about trying to unify their latest stuff into something more cohesive. If you try to code this stuff by hand, it's an absolute mess.
The P proposal is basically DOA. It could happen, but nobody's interested at this point. Just like the B proposal subsumed a bunch of ridiculously small extensions, I expect a new V proposal to simply unify these. As you point out, there isn't really any conflict between these tiny instruction releases.
There is discussion around the 48-bit format (the bits have been reserved for years now), but there are a couple different proposals (personally, I think 64-bit only with the ability to put multiple instructions inside is better, but that's another topic). Most likely, a 48-bit format does NOT do multiple encoding, but instead does a superset of encodings (just like how every 16-bit instruction expands into a 32-bit instruction). They need/want 48-bits to allow 4-address instructions too, so I'd imagine it's coming sooner or later.
Either way, the length encoding is easy to work with compared to x86 where you must check half the bits in half the bytes before you can be sure about how long your instruction really is.
8. There could be some variance, but x86 has this issue too and SO many more besides.