| My example is applicable to compiler / assembler / JIT / emulator. The performance of conventional compilers and assemblers is not important to anyone but developers, but everyone uses JavaScript / WebAsm all the time. And QEMU can be important too (e.g. in docker for non-native ISAs, using binfmt_misc). I guess I should point out in the proposed RISC-V example, it's 6 bytes of code as the initial shift can be a 2-byte "C" extension instruction. So that's slightly smaller code than everything except 32 bit PowerPC, which is another important aspect. Arm64 and M68k use 8 bytes of code. Oh! I just realised standard RISC-V can be improved in this case (but not by so much in the general case). srli x12, x10, 20 # shift field down to correct position
andi x12, x12, 0x7FE # mask to 10 bits
andi x11, x11, ~0x7FE # clear space in the destination
or x11, x11, x12 # insert the field
That's just 12 bytes of code.In the more general case you need a `lui` or `lui;andi` pair to load the mask into a register, and then register to register ops, for 14 bytes total. Note that x86_64 needs four instructions and 14 bytes of code, so no better than RISC-V. |