|
|
|
|
|
by modeless
1477 days ago
|
|
Stuff like adding SVE2 can be great for specific applications but it's really marginal when looking at whole system performance. What's not marginal are the improvements in power efficiency and room for more cache that come with new process nodes. These chips are power constrained in almost everything they do, because of heat dissipation or battery life or both. Less power and more cache benefits everything automatically, not just the very few things that actually start using new SIMD instructions or other new hardware blocks each year. |
|
It is not. A recent paper (https://arxiv.org/pdf/2205.05982.pdf) from Google engineering has compared performance of a vectorised (SIMD) vs non-vectorised implementation of the quick sort in the Highway library as well as the performance difference of the AVX-512 vs NEON/SVE1 implementations. By switching to the SIMD processing alone, the 9-19x speedup has been reported, depending on the SIMD unit size (32/64/128-bit numbers have been sampled and measured up). Even the smallest of the two, the 9x perfomance gain factor, is far from being marginal.
On the SIMD unit size of things, the performance difference between AVX-512 (the average of 1120 Mb/sec has been measured) and NEON implementation (the 478 Mb/sec throughput on average) is 2.4x smaller for NEON/SVE1 largely due to the smaller width of the units of processing. Again, the 2.4x factor is not in the marginal territory.
> What's not marginal is the improvements in power efficiency that come with new process nodes.
And that is an optimisation step, albeit a very important one. However, it will not make a quick sort implementation run 2.4x faster alone.