Hacker News new | ask | show | jobs
by randomstring 2207 days ago
My last company had something like that and included response time percentiles (50th, 90th, 95th, 99th) and we had these values graphed and displayed on a big screen in our office. Along with a ton of other performance stats: queries per second, various measures of system load, etc.

Averages can lie, especially when something like an empty query can take close to zero time compared to a non-trivial transaction. If some robot or other artifact of your site is generating a some amount of null queries that will make your average response time look better than it actually is. Percentiles, particularly on the tail of 90th or above, tell a better story of how well and consistently you're responding to traffic under load.

1 comments

How "recent" are your percentiles? I have found that calculating percentiles is a pretty CPU heavy task. And you if you have a giant Grafana querying every 30s it can stress out your prometheus/graphite whatever. But if you take small data size, like 95th percentile of latencies in the last 2minutes, it's not really a very accurate representation either.

And ofcourse there is another problem of correctly storing all your latencies accurately which becomes pretty hard if you are using something like prometheus.