|
|
|
|
|
by lifthrasiir
660 days ago
|
|
Because it would be slower when the exact calculation is not necessary. The xsum paper does have performance numbers, but all of them came from at least decade-old processors and almost every result indicates that superaccumulators are still 4--8x slower than the simple sum (but faster than the traditional Kahan summation). Superaccumulators require extensive scatter-gather operations due to its large memory footprint and I think the gap should have been even widen today as they would be harder to vectorize efficiently. |
|
If the data confined to a certain range of exponents, one could reduce the size of the accumulator, perhaps significantly.
Re 4-8x -- the large option in xsum was benchmarked at less than 2x the cost of a direct sum. Not so bad?