| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MiguelHudnandez 1095 days ago
	Broad alerts are really good to have. Narrow metrics are great to have once something goes wrong. When a server does go down, what did CPU, memory, disk IO look like? Did the request count climb quickly before the outage? Having those other metrics help for speedy troubleshooting -- Is it a software problem that got out of control or did some piece of hardware die or get throttled? I'm of the opinion that having charts and graphs to rely on can focus troubleshooting resources more quickly onto the most actionable areas.