Hacker News new | ask | show | jobs
by phire 2006 days ago
No. For many algorithms, AVX isn't a 2x speedup over SSE. Especially when lanes are conditionally masked.

Often you are happy to get a 1.25x speed up with AVX. Sometimes it actually goes slower.

If you were to emulate that code with a 1.25x speedup with AVX on the M1, you would end up with all the disadvantages of going to 8-wide, but with none of the speedup.

That 1.25x speedup is halved and the emulated AVX code actually runs at about 0.625x the speed of the emulated SSE code path.