|
Sure. So, for any black-box service endpoint, the latency for any given request is obviously just the time it takes for that operation to complete. Ideally one measures both end-to-end latency from the client and server-side latency in order to understand the impact of the network and, for high-throughput applications, any kernel buffering that takes place. All of that is obvious, I imagine. By "high-percentile latency", I'm referring to percentiles of a distribution of all latency measurements gathered from a given endpoint over some period of time. If you imagine that distribution as a frequency histogram, the horizontal axis ends up being buckets of latency ranges (e.g., 0-10ms, 10-20ms, 20-30ms, etc), and the bars themselves of course represent the number of samples in each such bucket. What we want to do is determine which bucket contains the 95th percentile (or 99th, or 99.9th) latency value. You can see such a latency distribution on page 10 of this paper which I published while at Google: http://research.google.com/pubs/pub36356.html
Anyway, it is a mouthful to explain latency percentiles, but in practice it ends up being an extremely useful measurement. Average latency is just not that important in interactive applications (webapps or otherwise): what you should be measuring is outlier latency. Every service you've ever heard of at google has pagers set to track high-percentile latency over the trailing 1m or 5m or 10m (etc) for user-facing endpoints.Coming back to Rails: latency is of course a concern through the entire stack. The reason Rails is so problematic (in my experience) is that people writing gems never seem to realize when they can and should be doing things in parallel, with the possible exception of carefully crafted SQL queries that get parallelized in the database. The Node.js community is a little better in that they don't block on all function calls by convention like folks do in Rails, but it's really all just a "cultural" thing. I don't know off the top of my head how things generally work in Django... One final thing: GC is a nightmare for high-percentile latency, and any dynamic language has to contend with it. Especially if multiple requests are processed concurrently, which is of course necessary to get reasonable throughput. Hope this helps. |
In my experience, when using Django or one of the other WSGI-based Python web frameworks, the steps to complete a complex request are serialized just as much as in Rails. The single-threaded process-per-request model, based on the hope that requests will finish fast, is also quite common in Python land.
You mention that GC is a nightmare for high-percentile latency. Isn't this just as much of a problem for Go? Would you continue to develop back-end services in C++ if not for the fact that most developers these days aren't comfortable with C++ and manual memory management?