Hacker News new | ask | show | jobs
by Jensson 1611 days ago
Depends, if the function is vectorizable then the cpu can do more elements at a time if it doesn't do the branch prediction work. It is true for non-vectorizable work.
1 comments

In autovectorized loops, the generated code typically needs length checks (or static length proofs) to handle tails of vectors. But yes there are still cases where the cost can be measurable.