Hacker News new | ask | show | jobs
by TinkersW 464 days ago
The accumulator is a vector type, with 64 bit sum you can only fit 4 into a 256 bit register.

After the loop it will do a horizontal add across the vector register to produce the final scalar result.