|
|
|
|
|
by Sesse__
253 days ago
|
|
I never understood why they couldn't just include a movmskb instruction to begin with. It's massively useful for integer tasks, not expensive to implement as far as I know, and the vshrn+mov trick often requires an extra instruction either in front or after (in addition to just being, well, pretty obscure). NEON in general is a bit sad, really; it's built around the idea of being implementable with a 64-bit ALU, and it shows. And SVE support is pretty much non-existent on the client. |
|
I personally think calibrating ARM's ISA on smaller VL was a good choice: you get much better IPC. You also have an almost-complete absence of support for 8-bit elements with x86 ISAs, so elements per instruction is tied. And NEON kind-of-ish makes up for its small VL with multi-register TBL/TBX and LDP/STP.
Also: AVX512 is just as non-existent on clients as SVE2; although not really relevant for the server-side targets I'm optimising for (mostly OLAP).