| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cratermoon 1763 days ago
	What if the next bug has nothing to do with the message queue? What if your last fix to improve the TPS ends up being sufficient headroom that the next problem never triggers a high rate? What if your errors per minute is only aggregating errors for the services you last had high error rates on? The point: all those measures and the dashboard created to monitor them were likely put in place because of whatever prior bug or outage was traced to not knowing those metrics. But the next problem might be something else, for which the metrics are not collected, aggregated, or displayed. Now you've got a dashboard with lots of information, but it's not showing any problems, and it's not providing any insights into why your customers are complaining and all the product people are in fire drill mode.