| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 1915 days ago

Interestingly, not sharing read-only data is a well-known optimization for parallel code -- it can famously produce a super-linear scaling effect. While counterintuitive, the reason is mundane.

The majority of high-performance code is limited by memory bandwidth, not CPU or I/O. Consequently, throughput is a function of the aggregate CPU cache efficiency of the entire system. If read-only data is aggressively shared across cores, it means a large part of the total CPU cache capacity of the system is being used to store the same data multiple times, thereby wasting memory bandwidth and reducing system throughput.

If you do not share any read-only data, it maximizes the amount of the workload resident in the various CPU caches. This can have a significant performance effect in real parallel systems.