I did, and I truly do not understand why some people do this.
As shown in the reddit comments on this article [1], the initial intrinsics version was quite suboptimal and clearly worse than portable code [2].
When not busy unnecessarily rewriting everything for each ISA, it is easier to see and have time for vital optimizations such as unrolling :)