|
|
|
|
|
by electricshampo1
1928 days ago
|
|
That is essentially the approach mentioned in the article at "
UPDATE: see https://www.realworldtech.com/forum/?threadid=200693&curpost... for a dramatic simplification. Not catching this is an oversight on my part. This post will be updated to include numbers with the mentioned strategy. UPDATE: To my surprise and after much fiddling, I did not manage to write
a version that was measurably faster (indeed they were at least a percent slower) than the hand written sum_avx512 shown below. There is almost certainly something that I am doing wrong but I can’t seem to figure out what it is. I will take this opportunity to leave this as an exercise for the reader :).
" |
|