Hacker News new | ask | show | jobs
by ninkendo 3032 days ago
What about the requests that are processed by a core that's doing a GC? Wouldn't that cause a higher P99 latency exactly as you'd expect?

Spending 2.5% of cycles on GC doesn't mean those cycles are perfectly distributed. The distribution of GC work onto cores is bound to have some cores doing more work than others, which would (I would think) manifest itself as some requests that land on an unlucky core getting more latency. Isn't it totally expected that this would cause a P99 latency spike?

1 comments

Maybe if you operated your system at the saturation point, which you really don't want to do in practice. Instead you want your queues to be mostly empty. Bursts are inevitable but bursts coinciding with GC doing work hopefully is a beyond 99th percentile thing. Of course if we're speaking purely theoretical we could also assume spherical cows in a vacuum and say that requests don't burst and simply arrive in a metronome-like trickle and then the spikes evaporate too. This is basically queuing theory.

And you could also use more CPU cores than request workers, that way you will always have spare core capacity and thus your latency will not be directly impacted. That is if you really really value latency more than throughput.

Again, my main point is that throughput and latency are not the same thing. There is some relation in so far that you cannot fulfill latency promises if your throughput is insufficient and your queues start filling up. But below the saturation point it's a lot more complicated, especially in parallel systems with bursty arrivals.