Hacker News new | ask | show | jobs
by ajtulloch 2645 days ago
Unaligned and aligned loads of AVX vectors have basically the same performance since Ivy Bridge IIRC.
1 comments

I was under the impression that unaligned ops ran at the same speed, but they used up more register ports, so it does reduce memory bandwidth between the register file and cache. Or does this no longer apply either?
My understanding is that the first unaligned load uses more register ports[0], but a second (third, etc) contiguous load doesn't. IANA[intel microarchitechure expert] though.

0: Or more memory bandwidth anyway.