Hacker News new | ask | show | jobs
by the_arun 3138 days ago
How Prometheus compares to Splunk?
1 comments

Splunk is a event logging system, compared to Prometheus which is metrics based. You need both types of systems to be able to properly observe your systems, they're complementary.
> Splunk is a event logging system, compared to Prometheus which is metrics based. You need both types of systems to be able to properly observe your systems, they're complementary.

While this is the traditional way of looking at them, I strongly disagree that metrics and logs are different toolsets, or that you would need both of them in order to properly observe your systems.

I've written and spoken about this approach before: https://medium.com/@chimeracoder/dont-read-your-logs-13586c7... and https://vimeo.com/221049715

> I strongly disagree that metrics and logs are different toolsets, or that you would need both of them in order to properly observe your systems.

And from the link:

> Logging can be useful for some purposes. However, it’s rare that they’re the only tool for monitoring your code. And it’s even rarer that they’re the best tool.

Metrics are a tool that take a different approach to logs, once you get beyond small systems you need both. I talked about this earlier in the year: https://www.youtube.com/watch?v=hCBGyLRJ1qo

https://youtu.be/hCBGyLRJ1qo?t=6m45s

[edit]

So I watched a few minutes into this, Brian, and it seems to me that either an expert system, or some form of rudimentary AI, that observes the monitoring system can be the driver of an intelligent alerting system. In other words, it seems 'alerts' are, in the final analysis, the higher value proposition.

And I fully agree with you: it really is a waste of talent to have engineers glued to screens watching graphs.

> Metrics are a tool that take a different approach to logs, once you get beyond small systems you need both. I talked about this earlier in the year:

Quite the opposite - the "some purposes" I'm talking about are precisely the small scale. As scale grows, the use case of logs and metrics converges, and metrics become a strictly better tool.

The question is about tracking and storing individual events (logs) with arbitrary per-item detail vs. dimensionally limited aggregations (time series / metrics). In either case, I think we agree that the data should be recorded in a structured way, and when I say "logs" I just mean a record of individual items, not of sampled/aggregated metrics.

Given that, you need both logs (individual events) and metrics. Logs give you crucial insight into individual interesting events such as single requests that bring your service down, but logs are orders of magnitude more expensive than metrics in tracking, storage, and processing. So that's why you use metrics for a much wider scope and for longer time periods.

Not for high cardinality events like what happened to a particular user during a single session. Metrics will never help with that type of problem.
> Not for high cardinality events like what happened to a particular user during a single session. Metrics will never help with that type of problem.

No, and as I explain in both that article and the video, logs aren't the best solution for that use case either.