Hacker News new | ask | show | jobs
by ack_complete 1295 days ago
This is always deeply frustrating. You quickly get the sense that the person you're talking to hasn't experienced anything beyond simple float loops that are trivial for the compiler to autovectorize, or really bad examples of hand vectorization.

In the meantime, I constantly encounter algorithms that compilers fail to vectorize because even single vector instructions are too complex for the compiler to match, such as saturating integer adds. The compiler fails to autovectorize and the difference in performance is >5x. Even just something simple like adding up unsigned bytes, and all three major compilers generate vector code that's much slower than a simple loop leveraging absolute difference instructions.

That's even before running into the more complex operations that would require the compiler to match half a dozen lines of code, like ARM's Signed Saturating Rounding Doubling Multiply Accumulate returning High Half: https://developer.arm.com/architectures/instruction-sets/int...

Or cases where the compiler is not _allowed_ to apply vector optimizations by itself, because changes to data structures are required.