|
Rotate and popcount are very specialised instructions. The vast majority of software doesn't use them at all, or uses them so infrequently that a software implementation is fine. You are confusing embedded applications, which have huge flexibility with RISC-V, and standard operating systems with packaged software. For the next few years (5?) standard operating systems have to support exactly two choices: - RV64GC - RVA22 RVA22 includes all the bit manipulation instructions, vectors, cache management, scalar crypto, and some other stuff. You can't pick and choose -- you have to support it all. If you are making an embedded appliance on the other hand you can pick and choose exactly what extensions you need (a huge number of combinations, as you say), specify a core with exactly those extensions, build a chip around that with the other IP blocks you need, and tell your compiler which extensions you have. You compile all your software yourself, whether bare metal, using an RTOS, or a minimal Linux such as builtroot or yocto. There is zero confusion because you know what you have and you have what you need -- no more and no less. No one who knows what they are talking about is talking about fusing five-instruction sequences. That's a total red herring. |
The assertion that rotate and popcount instructions are unimportant is false. All compilers peephole-optimize to generate rotate instructions where supported, and not because nobody needs that. There is a long history of mis-estimating instructions and their importance, going back to optimizing an instruction used only in a kernel idle loop.
A more objective measure is to note how often a neglected instruction has needed to be added after the first ISA version shipped, because its lack handicapped the chips on the market. Popcount wins that race everywhere: always neglected, always added. Its neglect reveals the blinders of the CS academics who do the initial ISA designs, and the need to patch reveals the reality.
The importance of an instruction is poorly represented by both its static frequency and by its total execution frequency for the same reason as that idle-loop instruction was miscounted: the importance of lines of code varies by many orders of magnitude, and there is no way to measure importance when counting. It is easy to prioritize instructions used in signature benchmarks, but they are a cracked mirror.
The market is another cracked mirror: it takes a very large signal to penetrate it. Any that does merits attention.