| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chfritz 258 days ago
	You seem to describe the problem of automated anomaly detection. Many companies tried or are trying to solve this (e.g., Heex), but I don't think anyone has done it definitively. The issue is that "normal" behavior keeps changing, so its difficult to build a model of what is abnormal. And by the time the behavior of the robots in the fleet becomes more stable (in all aspects, physical, electrical, networking, logging, etc.), it's usually easy for the engineers who built it to put in the right metrics and health-monitoring checks to detect issues. So even though theoretically automated anomaly detection sounds like the holy grail of fleet observability, in practice, it's not such a big deal. So I guess to answer your question, I think yes, the second, better tooling (and a ton of metrics data collected from the fleet with good versioning).