Hacker News new | ask | show | jobs
by packetslave 2527 days ago
A single host stuck at 100% CPU also has a nasty effect on your tail latency, in a system with wide fanout. If a request hits 100 backend systems, and 1 of them is slow, your 99th percentile latency is going to go in the toilet.
1 comments

Which is a good reason to hedge and replicate but NOT a reason to alert on high CPU usage of single computers.
You definitely want to TRACK cpu usage on individual hosts, but, yeah, I would alert on service latency instead. Symptom, not cause.