ahh, so it does come back to cache line alignment. Reading aligned data doesn't give any benefit in and of itself[1]. At least not on modern hardware. I guess the performance improvement would make sense since SIMD instructions are sized to be a multiple of the cache line size.