Hacker News new | ask | show | jobs
by plesner 1924 days ago
Could you not just do

sum(byteVals) + sum(intVals) + 128 * len(intVals)?

2 comments

That is essentially the approach mentioned in the article at

" UPDATE: see https://www.realworldtech.com/forum/?threadid=200693&curpost... for a dramatic simplification. Not catching this is an oversight on my part. This post will be updated to include numbers with the mentioned strategy.

UPDATE: To my surprise and after much fiddling, I did not manage to write a version that was measurably faster (indeed they were at least a percent slower) than the hand written sum_avx512 shown below. There is almost certainly something that I am doing wrong but I can’t seem to figure out what it is. I will take this opportunity to leave this as an exercise for the reader :). "

There are many ways to to solve every problem. Sometimes the factor is easy to see and it drops right out of the equation.
And even if you didn't, I wonder what the motivation is for trying to do it all in one loop.

    sum(byteVals != -128) + sum(intVals)
should vectorize more nicely and be at least as cache friendly.