Hacker News new | ask | show | jobs
by kev009 2862 days ago
OTOH you can make pretty sweeping generalizations. For example, FreeBSD recently shuffled the members of the TCP structures around to line up assuming 64B boundaries and mitigate some of the false sharing associated with locks and memory barriers (locking implementations usually cause a complete cacheline Read-Modify-Write that has to be communicated over the cache coherence protocol potentially xxx ns away). For large structures, putting things in the first or second part (64 to 128B) will generally get you cache hits due to prefetching so it can behave somewhat ganged in practice as well. If you have something designed for 64B lines that doesn't work in a linear fashion, the prefetcher may be wasting a lot of bandwidth. https://www.akkadia.org/drepper/cpumemory.pdf