Hacker News new | ask | show | jobs
by ezdiy 2986 days ago
TLB is only an indirect cause. This is because kernel scheduler preempts processes fairly infrequently (100 or 1000hz, or dynamic, but still capped to a small number).

Scheduling quantums are so large precisely to keep TLB flush overhead of a context switch low. If a network mandates more interaction (say, 100k req/s across all workers), each quantum tick must process a queued bundle of 1000 requests which piled up while asleep. This works as designed - you're supposed to use up all of your quantum, and not terminate it early by issuing blocking IO per request. One prerequisite for this is that your network/disk protocol must be pipelineable (most are because thats how we deal with network/seek latencies).

But at certain point the overhead of this pipelining itself becomes so great (message queues too deep) you have to switch to threading.

Hardcore threading advocates on the other hand, need to account for overhead of atomics (for locking, or for "lockless" algorithms). An atomic must wait for all pending writeback flush. Threading gets a lot of bad rep not because "kernels suck at it", but because person making such a statement wrote their program as an exercise in lock contention and/or too much write cache pollution per single atomic.

Threading vs process tradeoff = deep pipeline overhead vs frequent queue flush+locking overhead tradeoff.

Typically, you need to meet somewhere in the middle for best performance, which is when you end up with threads with job queues - those basically emulate process-induced queues within thread model.

1 comments

   > Threading gets a lot of bad rep
   > not because "kernels suck at it",
   > but because person making such a
   > statement wrote their program as
   > an exercise in lock contention
Well put. I'm going to have this printed on a plaque and hung above my desk.