Hacker News new | ask | show | jobs
by CUViper 2309 days ago
There are some broad examples of that effect here: https://en.wikipedia.org/wiki/Speedup#Super-linear_speedup
2 comments

The caching effects mentioned there have to do with adding additional hardware resources that change the effective cache sizes and shift the working set into higher performing storage or memory.

You can also encounter a funny software engineering effect. Refactoring an array-based algorithm to work in parallel often involves the introduction of block-decomposition. This change allows sub-problems to be spread to different workers. Sometimes, this change will also accelerate the sequential code, as the new block-oriented access patterns improve cache efficiency!

Oh I see, that makes complete sense.

Basically, given two small arrays A and B (such that both fit in the CPU caches), if the computations to do are sum(A) * sum(B) and sum(B) / sum(A), having two threads that do the computations will be more than twice as fast as a single thread. In the two thread case, A and B will be fetched in parallel (assuming left-to-right evaluation), incurring only one round of pipeline stalls for data fetching, whereas the single threaded case will have to incur two pipeline stalls from memory fetching.