| > stream are the superior approach in my point of view as they allow for realtime approaches and stream (window based) analytics. I'd see them as slightly different approaches to providing fundamentally the same solution. One builds up time series and then operates on them, the other operates on the time series as they come in. Taking Prometheus as an example we're a time series database, and you can do both realtime and window-based analysis. In fact that's how it is usually used. > I would say that SignalFX is the most sophisticated Do you have an example of something that you can do with your streaming approach that's not possible with other tools? It's hard to get a proper understanding of the myriad of monitoring systems out there, so I'm always looking for insights. > Our agent automatically discovers all the components and dependencies and adds them to the graph in realtime. That sounds interesting, how do you do that for network dependencies? Do you have something like Zipkin? |
My point was more about the framework you get and how easy it is to apply analytics to streams/queries. SignalFx seems to have a nice workbench for this with direct visual feedback in the UI, so that you can work on existing data to get the right result.
As said we at Instana think that most people will not be able to build a sophisticated monitoring solution with these types of frameworks as they don't have the time to do it and maybe even not the analytical domain knowledge. You can see that SignalFx is adding specific knowledge for some technologies. I give you two simple examples to show that it is not easy:
- How would you predict if a file system is running out of disk space?
- How would you predict if you should add a node to a Cassandra cluster because it is running out of capacity (and it can take some serious time to add a node, so you should know in advance)?
Already the disk space problem is hard to solve - linear regression and basic algorithms will not work.
Now think of hundreds (or thousands) of services running on a dynamic container platform and new services released on a daily or even minute basis - with lots of different technologies involved...
No question that you can build a good monitoring solution with Prometheus, SignalFX, DataDog etc - but it will take a serious amount of time, consulting and dev teams involved adding the right instrumentation, metrics etc. And you need a lot of analytical knowledge. I can even imagine that there are situation were tools like Prometheus are a better choice - especially if you have a very strict set of technologies and communication framework and really good people to do a very specific set of "rules" for this environment.
We've added a domain model to our product (all the mentioned product have a generic metric model, but no semantics that describe servers, containers, processes, services and their communication which is the domain of system and application monitoring): Our Dynamic Graph.
And yes, we are using something very similar to Zipkin to get the dependencies between services. Here a are two blog entries describing the approach:
- About distributed tracing: https://www.instana.com/blog/evolution-tracing-application-p...
- How we safely instrument code: https://www.instana.com/blog/how-instana-safely-instruments-...
Mirko