Hacker News new | ask | show | jobs
by jayd16 1985 days ago
Its a little confusing because they're conflating the idea that you almost certainly read at least the entire word (and not a single byte) at a time with the other idea that you could fetch multiple words concurrently.
1 comments

Any cached memory access is going to read in the entire cache line -- 64 bytes on x86, apparently 128 on M1. This is true across most architectures which use caches; it isn't specific to M1 or ARM.
(As I learned from recent Rust concurrency changes) on newer Intel, it usually fetches two cache lines so effectively 128 bytes while AMD usually 64 bytes. That's the sizes they use for "cache line padded" values (I.e making sure to separate two atomics by the fetch size to avoid threads invalidating the cache back and forth too much).
To be clear here, it fetches two cache lines but it doesn’t put the second in exclusive state until it’s written to; the unit of granularity is still 64b. In a scanning read mode you will see the benefit but you won’t see the contention on writes. (The contention will come from subsequent reads on that cache line though)
Yes almost certainly more than the word will be read but it varies by architecture. I would think almost by definition no less than a word can be read so I went with that in my explanation.