|
|
|
|
|
by inkyoto
1477 days ago
|
|
Low level runtime optimisation that yields substantial performance gains in the user facing or system level software, ranging from cryptography through to data processing algorithms and very high throughput JSON parsing. Take OpenSSL as an isolated example. By simply fiddling with the C compiler flags to allow it to use NEON on M1, the sha256 calculation speed-up is 4x for 128 and 256 block sizes, with performance gains quickly tapering off for larger block sizes and resutling in a modest 10% increase only. And that performance increase happens without the involvement of hash functions having been manually optimised for NEON/SVE1. SVE2 with its variable vector size support could improve performance for larger unit sizes. Perhaps it is the time to spin up a Graviton3 instance and poke around with clang/gcc to see how actually good or faster the SVE2 is. |
|
Because outside of servers where little cores don't exist, 256b ALUs in big cores mean 256b registers in little cores, and Cortex-A510 is way smaller than Gracemont. And then you're giving Samsung another opportunity to screw up big.LITTLE...
And even the server CPUs with SVE are 2x256b except A64FX which is HPC exclusive, so no better than 4x128b.