Hacker News new | ask | show | jobs
by emperorcezar 2930 days ago
It only does because the author is testing for their specific environment. At some number of cores, the concurrent calls will produce more performant code than running sequentially.

Ideally, in this case I would think one would want to check the number of cores and decide what route to take.

1 comments

When the author removed parallelism the first time, I don't think this is the case. Running things in parallel has a cost. That cost often comes in the form of memory allocations and data copies so the unit of work can be stored and shared with another thread, and the synchronization costs of scheduling threads. If that aggregate cost is greater than the computational cost of what you're computing, you'll never win.

For the point at which the author removed parallelism, and the sequential code was faster, I think this was the case. The computation was too fine-grain. The author successfully took advantage of parallelism by applying it at a coarser granularity; each thread did more work. At this point, the author also does tune the solution for the execution environment, as he uses a fixed set of go-routines to process a bunch of messages rather than one go-routine per message.

scott_s you're totally right on both points.

FWIW I really mean the "take the numbers with a grain of salt" advice, i.e. "Your mileage may vary". What I'm sharing in this article is not a bunch of hard, strong, exact numbers ; It's a journey and an invitation to apply similar reasoning process to your own use case and hardware.

For the record, I enjoyed your post. It's a great example of what clear-headed performance optimization looks like.