Hacker News new | ask | show | jobs
by newgre 464 days ago
Why did the compiler even chose to fetch DWORDs only in the first place? It's unclear to me why the accumulator (apparently) determines the vectorization width?
1 comments

The accumulator is a vector type, with 64 bit sum you can only fit 4 into a 256 bit register.

After the loop it will do a horizontal add across the vector register to produce the final scalar result.