I did, and I truly do not understand why some people do this.
As shown in the reddit comments on this article [1], the initial intrinsics version was quite suboptimal and clearly worse than portable code [2].
When not busy unnecessarily rewriting everything for each ISA, it is easier to see and have time for vital optimizations such as unrolling :)
When not busy unnecessarily rewriting everything for each ISA, it is easier to see and have time for vital optimizations such as unrolling :)
[1]: https://www.reddit.com/r/cpp/comments/1gzob1g/understanding_... [2]: https://github.com/google/highway/blob/master/hwy/contrib/do...