|
|
|
|
|
by yibg
2527 days ago
|
|
Outliers are where the interesting stuff happens, and outliers happen to individual instances. Aggregates are useful but can be very misleading. You can have milliseconds 99 percentile latency with ~1% of requests timing out. I wouldn’t alert on a single machine having CPU issues, but I’m definitely interested in a small collection of individual machines all having CPU issues at the same time. |
|