Hacker News new | ask | show | jobs
by celeritascelery 1886 days ago
> The implementation is similar to the one in simdjson except that it aligns reads to the block size of the SIMD extension, which leads to better peak performance compared to the implementation in simdjson.

I didn't really understand this part. Aligned to what? to the cache line? SIMD always reads the block size. Unless I am missing something here.

1 comments

I read it as "to the width of the SIMD registers" which I have seen in other quick scanners, but did not read the code here.
Aren’t all reads aligned to the width of the SIMD register? If I do an AVX512 command it will read 512 bits right?
It's about where you read data from, not how much data gets read. For example an AVX read is aligned if the address being read from is a multiple of 32 bytes, otherwise it's unaligned and runs slightly slower, and slower still if it happens to straddle two cachelines. The same applies to write instructions as well.

It's less of an issue than it used to be, the penalty for unaligned access has steadily been reduced by newer CPU architectures, but it's still there.

ahh, so it does come back to cache line alignment. Reading aligned data doesn't give any benefit in and of itself[1]. At least not on modern hardware. I guess the performance improvement would make sense since SIMD instructions are sized to be a multiple of the cache line size.

[1] https://lemire.me/blog/2012/05/31/data-alignment-for-speed-m...