|
|
|
|
|
by kardos
659 days ago
|
|
My takeaway from the linked post is that the author is more concerned with floating point invariance across platforms than speed (although improvements in speed are of course welcome). If the data confined to a certain range of exponents, one could reduce the size of the accumulator, perhaps significantly. Re 4-8x -- the large option in xsum was benchmarked at less than 2x the cost of a direct sum. Not so bad? |
|
> Re 4-8x -- the large option in xsum was benchmarked at less than 2x the cost of a direct sum. Not so bad?
I don't know where you did take that number, because xsum-paper.pdf clearly indicates larger performance difference. I'm specifically looking at the ratio between the minimum of any superaccumulator results and the minimum of any simple sum results, and among results I think relevant today (x86-64, no earlier than 2012), AMD Opteron 6348 is the only case where the actual difference is only about 1.5x and everything else hovers much higher.