Hacker News new | ask | show | jobs
by kasey_junk 3423 days ago
The issue with doing "seconds per request overhead" instead of doing "requests per second" is that you've switched what you are measuring.

The requests per second statistic is measuring throughput, and the results from such a test can be easily represented as a single value. The seconds per request statistic is a measure of latency. Latency can't be represented with a single value in a meaningful way. It is a curve of values, so you'd need to know what percentage of requests fell under a threshold.

Where those thresholds are is extremely use case specific. Some people only care about 95% of requests, others have to care about much higher levels of resolution.

So if anyone gave me a single data point about their system latency, I'd be skeptical they knew what they were talking about. Even in this case we don't know if the latencies changed across the board, only on a few outliers, or on just the middle of the latency curve.

That said, I agree that this is a bit of a tempest in a teapot . In real world usage, if this regression really matters to you, you've probably already moved off of the standard library for a variety of other reasons.

2 comments

First of all, tweaking what we're measuring is sort of my point.

Second, though, if we're going to slice and dice that way, which is valid, I think you need to go even farther and point out that there are two cases. The first is when you are hammering requests through as quickly as possible, and the second is when you are not.

The latency numbers are highly specific to your load, because as load increases, things like scheduling algorithms start mattering more, especially the fundamental tradeoffs between latency and throughput. Knowing the distribution of these numbers under load is important... though I'd suggest that said distribution is still fairly likely to be dominated by the user code rather than the framework code. But the hello world benchmark is still a crucial one, because it serves as the limit of performance, so if you can show that some webserver can't even do what you need with that, you can eliminate it.

There is also the "request overhead in seconds" you get for a relatively uncontested system, where the system would have to be fairly pathologically broken to see a high variance in results. (You'll get some from GC, but in this case I wouldn't call that variance high in the patterns you'll see from a hello-world handler.) This number is important because while it is in a lot of ways more boring, it is also I suspect the relevant number for the modal web server. I suspect this is another one of those cases where some very visual image leaps to mind, the web server for Google or Facebook that is constantly getting hammered at 90% of capacity (and that carefully by design since systems get increasingly pathological as you approach 100%) serving highly optimized requests where every microsecond matters... but those are actually the rare web servers in the world. Most webservers are doing at least one of twiddling their thumbs for long stretches of time or waiting for user code to do what it's going to do in the milliseconds... or seconds... or minutes....

If what you are suggesting is that latency measurement is difficult but what is probably most interesting in the context of http service libraries, I completely agree.

My major issue was, if they had run this exact same test and reported in "request overhead in seconds" would be largely not valuable at all because it doesn't tell you nearly enough information to determine if there has bean a meaningful latency regression.

With throughput, its likely not as valuable in real usage, but the single stat does tell you there was a throughput regression.

So I think we agree that this isn't a meaningful regression, I just disagree that changing how you report the number would be valuable.

The thing is that humans usually care about the latency-CDF, even if they don't know it.

What good does a 100microsecond average latency (calculated as inverse of the throughput) do for you when simply loading a website issues 200 requests and your 99tile is closer 500ms for whatever reason? Suddenly your per-load average looks a lot different than your per-request average.

Pure throughput is what you want for batch processing without those pesky, impatient humans in the loop.

Agree with your point, but average latency isn't as simple as inverse of throughput, even on a serial processor.

Imagine a process that takes in a request, sleeps for 10s, and then provides a response. If taking in 1 million req/s, it can still provide 1 million responses/s for a throughput of 1 million req/s. Average latency is 10s.

Approximating latency as 1/throughput is only valid on a process that only handles 1 request at a time (no concurrency). I doubt this is the case for Go.

Latency impacts user happiness (did the page load quickly?). Throughput impacts operating costs (I need to buy N% more servers to serve as many requests with Go 1.8 as I did with 1.7.5).

From the original GitHub issue:

             Thread Stats   Avg      Stdev     Max   +/- Stdev
    Go 1.8rc3 Latency   192.49us  451.74us  15.14ms   95.02%
    Go 1.75   Latency   210.16us  528.53us  14.78ms   94.13%
Go 1.8rc3 has both a lower mean latency and lower standard deviation than Go 1.7.5. Go 1.8 decreased latency at cost of decreased throughput.