|
|
|
|
|
by brigade
1477 days ago
|
|
Yeah that's NEON. And there's instructions that literally calculate SHA256 so generalizing that is moot. My point was first, what real benchmarks are there of SVE2's benefits over NEON with mainstream CPUs that M2 would compete against? Unlike AVX-512, NEON was already pretty rich, so the new instructions have rather specialized usefulness. Because outside of servers where little cores don't exist, 256b ALUs in big cores mean 256b registers in little cores, and Cortex-A510 is way smaller than Gracemont. And then you're giving Samsung another opportunity to screw up big.LITTLE... And even the server CPUs with SVE are 2x256b except A64FX which is HPC exclusive, so no better than 4x128b. |
|
The purpose of SVE2 is to simplify the writing of the software that exploits the data parallelism, both when that is done manually and when that is done automatically by an autovectorizing compiler.
With SVE2 it should become much easier to deal with data structures where the sizes and the alignments are not multiples of the ALU width and it will also no longer be necessary to write many alternative code paths, to take advantage of any future better CPUs, like when optimizing for Intel SSE/AVX/AVX2/AVX-512.
There are still a majority of programs that do not utilize as frequently as possible the existing SIMD units. With SVE2, their number should diminish.