Hacker News new | ask | show | jobs
by twoodfin 2309 days ago
Of course in the general case Amdahl's law is inescapable, but some tasks on modern systems can show > ×N speedup over single-threaded performance if, for example, a single thread can only exploit at maximum some fraction of the total memory bandwidth or some level of the cache hierarchy.
1 comments

Do you have examples where running n threads instead of 1 results in a speedup greater than n?

The only thing I can think of would be that the additional threads would kick the CPU into using a higher frequency, but a single thread using 100% of the CPU should already do that.

There are some broad examples of that effect here: https://en.wikipedia.org/wiki/Speedup#Super-linear_speedup
The caching effects mentioned there have to do with adding additional hardware resources that change the effective cache sizes and shift the working set into higher performing storage or memory.

You can also encounter a funny software engineering effect. Refactoring an array-based algorithm to work in parallel often involves the introduction of block-decomposition. This change allows sub-problems to be spread to different workers. Sometimes, this change will also accelerate the sequential code, as the new block-oriented access patterns improve cache efficiency!

Oh I see, that makes complete sense.

Basically, given two small arrays A and B (such that both fit in the CPU caches), if the computations to do are sum(A) * sum(B) and sum(B) / sum(A), having two threads that do the computations will be more than twice as fast as a single thread. In the two thread case, A and B will be fetched in parallel (assuming left-to-right evaluation), incurring only one round of pipeline stalls for data fetching, whereas the single threaded case will have to incur two pipeline stalls from memory fetching.