Hacker News new | ask | show | jobs
by King1st 851 days ago
ARM has abysmal SIMD support. Not even supporting AVX 256. for reference that is about a 10 year lag behind x86. Neon is not an adequate substitute. Additionally when AVX is used, the power draw of a ARM skyrockets to x86 level defeating the advantage ARM has over x86 while offering worse performance. ARM is good for things that require many threads and are not heavily dependent on incredibly high integer performance per thread. x86 is the dominate in high power, when individual thread power is more important due to application requirements or size limits or if they need to do anything with SIMD. ARM has a lot of push behind it. But it contrary to techbro hype., it is not a drop in replacement for x86 and I dont think it ever will without shooting themselves in the foot making them less efficient.
3 comments

Focussing on vector length is an error. You should care about throughput and number of different instructions you can retire.

If I have a core with four 128-bit neon vector units, I have the same throughput as an x86 with two AVX2 units or one AVX512 unit. However that 4x128-bit core is actually more flexible than the other two as I can do 4 different things at once, or 4 scalar operations per cycle. (Of course the downside is you spend more frontend resources on decode).

Given that most code isn't vector code, the multiple short vector length approach is actually superior on many common real-world workloads that aren't machine-learning (and CPU is unit-of-last-resort for large ML workloads anyway).

ARM has, optionally,the very capable SVE instruction set, but no consumer chips currently implement it.

The M1 and later big cores can dispatch four NEON FMA instructions per clock, so 512 bits worth of vector math, which compares OK with most Intel or AMD chips (Zen 4 can do two 256-bit MUL and two ADD, and Intel "client" bigcores since Sunny Cove typically do three 256-bit FMA).

>when AVX is used

spoiler: it isn't

Then why did Intel and AMD add it to their chips? My guess is AVX is used for when it is worth the effort. If I had to guess, it is used for scientific computing, maybe video processing (as in movies, TV, etc.), maybe audio processing, etc.