|
|
|
|
|
by mwcampbell
4694 days ago
|
|
I guess the problem of high-percentile latency is not widely understood; I'm not sure I understand it myself. Can you explain in more detail? In particular, are you talking about requests that take a while to complete because they have some complex processing, or requests that take a long time to complete because they can't be processed until some other long-running request finishes? The bit about everything being serialized suggests that the main concern is the latter. Does this apply even when using multiple threads under the C Ruby implementation? Why does running multiple web server processes on the same machine not mitigate the problem? BTW, I don't use Rails or Ruby, but I do use Python for web apps at work (currently CPython, GIL and all). I'm curious to find out if this problem of high-percentile latency applies to Python as well. |
|
So, for any black-box service endpoint, the latency for any given request is obviously just the time it takes for that operation to complete. Ideally one measures both end-to-end latency from the client and server-side latency in order to understand the impact of the network and, for high-throughput applications, any kernel buffering that takes place.
All of that is obvious, I imagine. By "high-percentile latency", I'm referring to percentiles of a distribution of all latency measurements gathered from a given endpoint over some period of time. If you imagine that distribution as a frequency histogram, the horizontal axis ends up being buckets of latency ranges (e.g., 0-10ms, 10-20ms, 20-30ms, etc), and the bars themselves of course represent the number of samples in each such bucket. What we want to do is determine which bucket contains the 95th percentile (or 99th, or 99.9th) latency value.
You can see such a latency distribution on page 10 of this paper which I published while at Google:
Anyway, it is a mouthful to explain latency percentiles, but in practice it ends up being an extremely useful measurement. Average latency is just not that important in interactive applications (webapps or otherwise): what you should be measuring is outlier latency. Every service you've ever heard of at google has pagers set to track high-percentile latency over the trailing 1m or 5m or 10m (etc) for user-facing endpoints.Coming back to Rails: latency is of course a concern through the entire stack. The reason Rails is so problematic (in my experience) is that people writing gems never seem to realize when they can and should be doing things in parallel, with the possible exception of carefully crafted SQL queries that get parallelized in the database. The Node.js community is a little better in that they don't block on all function calls by convention like folks do in Rails, but it's really all just a "cultural" thing. I don't know off the top of my head how things generally work in Django...
One final thing: GC is a nightmare for high-percentile latency, and any dynamic language has to contend with it. Especially if multiple requests are processed concurrently, which is of course necessary to get reasonable throughput.
Hope this helps.