| HN Mirror

Maybe if you operated your system at the saturation point, which you really don't want to do in practice. Instead you want your queues to be mostly empty. Bursts are inevitable but bursts coinciding with GC doing work hopefully is a beyond 99th percentile thing. Of course if we're speaking purely theoretical we could also assume spherical cows in a vacuum and say that requests don't burst and simply arrive in a metronome-like trickle and then the spikes evaporate too. This is basically queuing theory.

And you could also use more CPU cores than request workers, that way you will always have spare core capacity and thus your latency will not be directly impacted. That is if you really really value latency more than throughput.

Again, my main point is that throughput and latency are not the same thing. There is some relation in so far that you cannot fulfill latency promises if your throughput is insufficient and your queues start filling up. But below the saturation point it's a lot more complicated, especially in parallel systems with bursty arrivals.