Hacker News new | ask | show | jobs
by capyba 224 days ago
Interesting, how so? I’ve had really good success with the autovectorization in gcc and the intel c compiler. Often it’s faster than my own instrinsics, though not always. One notable example though is that it seems to struggle with reduction - when I’m updating large arrays ie `A[i] += a` the compiler struggles to use simd for this and I need to do it myself.
1 comments

There's no optimal portable `movemask` operation. Because aarch64 NEON doesn't have it.