Hacker News new | ask | show | jobs
by rollcat 387 days ago
This rule scales up all the way to multi-node supercomputers / cloud. The overhead is negligible when you can employ 10.000 cores.
3 comments

Actually the overhead crushes you when you employ 10000 cores. If the overhead of a process is 10% and the parallel part is 90%, then 2 cores will result in a run time of 55% = 10% + 90%/2 of the original time. And 10 cores will get you to 19%. And 100 cores to 10.9%. If you then buy 9900 more cores to bring it to a total of 10000, you just reduced the runtime from 10.9% to 10.009%. In other words, you increased your bill by a factor of 100 to reduce your run time by almost nothing.
You two are talking about different kinds of overhead though.
Abstractly, when any parallel system scales up large enough without cross checking or waiting between "threads", the cost of de-duplicating and merging the output will probably outweigh the advantage of producing new results in tandem. I think. That's just a hypothesis, but feels correct. With stuff like a-life distributed over lots of servers all converging on evolutionary answers to a problem, it's the collation and analysis layer that's most expensive and slow. Sharing more frequently / allowing more reliance on central historical truth slows each one down but avoids duplication and redundancy. I guess where that point is depends on what problem you're trying to solve.
Amdahl says no.