| HN Mirror

Agreed. The best is going to be to use steering [1] and one pinned thread per core to keep each connection handled on one core as completely as possible.

...with the caveat that it makes the load-balancing much harder when each core is essentially an independent server. If you overload some cores, even briefly, your tail latency will really suffer. And if you decrease utilization to compensate for it, you've lost the efficiency advantage you were going for too. Such that the more conventional approach of a single multi-core reactor can be much better if you don't have a very good load-balancing story.

...another caveat: if you have some massive shared dataset (think search), the cache-efficient approach goes the total other way: each core should own some shard, and a single request should be fanned out across all of them.

...so the best model may vary, but it's not the one in this article.

[1] https://www.kernel.org/doc/html/v5.1/networking/scaling.html