| HN Mirror

When the author removed parallelism the first time, I don't think this is the case. Running things in parallel has a cost. That cost often comes in the form of memory allocations and data copies so the unit of work can be stored and shared with another thread, and the synchronization costs of scheduling threads. If that aggregate cost is greater than the computational cost of what you're computing, you'll never win.

For the point at which the author removed parallelism, and the sequential code was faster, I think this was the case. The computation was too fine-grain. The author successfully took advantage of parallelism by applying it at a coarser granularity; each thread did more work. At this point, the author also does tune the solution for the execution environment, as he uses a fixed set of go-routines to process a bunch of messages rather than one go-routine per message.