|
|
|
|
|
by mjb
1755 days ago
|
|
The other two answers you got are good. I will say that monitoring p99 (or 99.9 or whatever) is a good thing, especially if you're building human-interactive stuff. Here's my colleague Andrew Certain talking about how Amazon came to that conclusion: https://youtu.be/sKRdemSirDM?t=180 But p99 is just one summary statistic. Most importantly, it's a robust statistic that rejects outliers. That's a very good thing in some cases. It's also a very bad thing if you care about throughput, because throughput is proportional to 1/latency, and if you reject the outliers then you'll overestimate throughput substantially. p99 is one tool. A great and useful one, but not for every purpose. > Because I dont know anyone who has utilisation to 1 or even 0.5 in production. Many real systems like to run much hotter than that. High utilization reduces costs, and reduces carbon footprint. Just running at low utilization is a reasonable solution for a lot of people in a lot of cases, but as margins get tighter and businesses get bigger, pushing on utilization can be really worthwhile. |
|
To be fair, it was a real time gambling operation. And we were operating within the first Nielsen threshold.
ß: bucketing these by request route was quite useful.
EDIT: formatting