|
|
|
|
|
by King1st
851 days ago
|
|
ARM has abysmal SIMD support. Not even supporting AVX 256. for reference that is about a 10 year lag behind x86. Neon is not an adequate substitute. Additionally when AVX is used, the power draw of a ARM skyrockets to x86 level defeating the advantage ARM has over x86 while offering worse performance. ARM is good for things that require many threads and are not heavily dependent on incredibly high integer performance per thread. x86 is the dominate in high power, when individual thread power is more important due to application requirements or size limits or if they need to do anything with SIMD. ARM has a lot of push behind it. But it contrary to techbro hype., it is not a drop in replacement for x86 and I dont think it ever will without shooting themselves in the foot making them less efficient. |
|
If I have a core with four 128-bit neon vector units, I have the same throughput as an x86 with two AVX2 units or one AVX512 unit. However that 4x128-bit core is actually more flexible than the other two as I can do 4 different things at once, or 4 scalar operations per cycle. (Of course the downside is you spend more frontend resources on decode).
Given that most code isn't vector code, the multiple short vector length approach is actually superior on many common real-world workloads that aren't machine-learning (and CPU is unit-of-last-resort for large ML workloads anyway).