Hacker News new | ask | show | jobs
by maweki 2310 days ago
"It would be extremely surprising, then, if running with N threads actually gave ×N performance."

Basically impossible by Ahmdal's law.

2 comments

Of course in the general case Amdahl's law is inescapable, but some tasks on modern systems can show > ×N speedup over single-threaded performance if, for example, a single thread can only exploit at maximum some fraction of the total memory bandwidth or some level of the cache hierarchy.
Do you have examples where running n threads instead of 1 results in a speedup greater than n?

The only thing I can think of would be that the additional threads would kick the CPU into using a higher frequency, but a single thread using 100% of the CPU should already do that.

There are some broad examples of that effect here: https://en.wikipedia.org/wiki/Speedup#Super-linear_speedup
The caching effects mentioned there have to do with adding additional hardware resources that change the effective cache sizes and shift the working set into higher performing storage or memory.

You can also encounter a funny software engineering effect. Refactoring an array-based algorithm to work in parallel often involves the introduction of block-decomposition. This change allows sub-problems to be spread to different workers. Sometimes, this change will also accelerate the sequential code, as the new block-oriented access patterns improve cache efficiency!

Oh I see, that makes complete sense.

Basically, given two small arrays A and B (such that both fit in the CPU caches), if the computations to do are sum(A) * sum(B) and sum(B) / sum(A), having two threads that do the computations will be more than twice as fast as a single thread. In the two thread case, A and B will be fetched in parallel (assuming left-to-right evaluation), incurring only one round of pipeline stalls for data fetching, whereas the single threaded case will have to incur two pipeline stalls from memory fetching.

well before that kicks in, if your code requires any coordination at all (not Monte Carlo) then those overheads can scale with the number of processes. so rather than hitting an asymptote as you'd get with just Ahmdal, you actually start to go down in absolute terms as the number of processes increases.