|
In practice, if your system if complicated and you have to look at the visualization, you are already in trouble. For anything complicated, you need exactly the inputs you describe, but everything has to be processed already by another layer that can give you higher level ideas. This is a place where I think you guys could beat what other 3rd party monitoring tools are doing. I work with some of your guest bloggers, and I work on a subsystem with its own dashboard: about 50 charts. To make bringing new teammates a sensible experience, we need both a layer of alerts on top of the charts, and then a set of rules of thumb, that should be programmed if the alerting system was good enough, that put the alerts together into realistic failure cases: if X and Y triggered, but Z didn't, then chances are this piece is probably the culprit. There's also opportunities in visualizations that aren't chart based: We used to have something like that for another complex system in another employer, but that's expensive, custom work, unless you join forces with something that understands were all your services are, knows all ingress and egress rules, and thus could automatically generate a picture of your system, along with understanding the instrumentation: So leave that until you merge with SkylinerHQ or something. That said, I think you guys are heading towards a good, marketable product as it is. Fixing the annoying the statsd/splunk divide of older monitoring would probably make us buy it already. |