Y
Hacker News
new
|
ask
|
show
|
jobs
by
gpderetta
1075 days ago
Indeed, the blocked vectorization with 8 bits accumulators shown elsethread is going to be faster and there reducing the sum to 1 bit per iteration is worth it.