|
|
|
|
|
by boulos
660 days ago
|
|
This seems to keep coming up, and I see confusion in the comments. There is a standard: IEEE 754-2008. There are additional things people add like approximate reciprocals and approximate sqrt. But if you don't use those, and you don't make an association error, you get consistent results. The question here with association for summation is what you want to match. OP chose to match the scalar for-loop equivalent. You can just as easily make an 8-wide or 16-wide "virtual vector" and use that instead. I suspect that an 8-wide virtual vector is the right default for people currently, since systems since Haswell support it, all recent AMD, and if you're using vectorization, you can afford to pay some overhead on Arm with a double-width virtual vector. You don't often gain enough from AVX512 to make the default 16-wide, but if you wanted to focus on Skylake+ (really Cascadelake+) or Genoa+ systems, it would be a fine choice. |
|
Isn't it the other way around? The scalar for-loop was changed to match the vector loop's associativity. "To solve this problem for astcenc I decided to change our reference no-SIMD implementation to use 4-wide vectors."