|
|
|
|
|
by ashtonsix
255 days ago
|
|
Not having (1 << k) - 1 as a single instruction sucks when it HAS to be in a hot loop, but you can usually hoist this to the loop prolougue: my stuff uses dummy inline assembly hints to force compilers to do this `asm volatile("" : "+w"(m));`. I personally think calibrating ARM's ISA on smaller VL was a good choice: you get much better IPC. You also have an almost-complete absence of support for 8-bit elements with x86 ISAs, so elements per instruction is tied. And NEON kind-of-ish makes up for its small VL with multi-register TBL/TBX and LDP/STP. Also: AVX512 is just as non-existent on clients as SVE2; although not really relevant for the server-side targets I'm optimising for (mostly OLAP). |
|
Also AVX512 is way more common than SVE2, all Zen4 & Zen5 support it.