Hacker News new | ask | show | jobs
by jerrac 1001 days ago
Not sure I haven't run across it before, but this is the first time I've tried using Netdata. Looks like it is very good for metrics, at least in the 10 minutes I have spent installing it on my local desktop and poking around the ui there.

I'm not seeing anything in it for logs, though. I'm guessing it doesn't aggregate or do anything with logs? What do you use for log aggregation and analysis?

I'm very interested because I've been getting frustrated with the ELK Stack, and the Prometheus/Grafana/Loki stack has never worked for me. I'm really close to trying to reinvent the wheel...

6 comments

If you want easy to install, maintain and use system for logs, then take a look at VictoriaLogs [1] I'm working on. It is just a single relatively small binary (around 10MB) without external dependencies. It supports both structured and unstructured logs. It provides intuitive query language - LogsQL [2]. It integrates well with good old command-line tools (such as grep, head, jq, wc, sort, etc.) via unix pipes [3].

[1] https://docs.victoriametrics.com/VictoriaLogs/

[2] https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html

[3] https://docs.victoriametrics.com/VictoriaLogs/querying/#comm...

Prometheus has become ubiquitous for a reason. Exporting metrics on a basic http endpoint for scraping is as simple as you can get.

Service discovery adds some complexity, but if you’re operating with any amount of scale that involves dynamically scaling machines then it’s also the simplest model available so far.

What about it doesn’t work for you?

Edit: I didn’t touch on logging because the post is about metrics. Personally I’ve enjoyed using Loki better than ELK/EFK, but it does have tradeoffs. I’d still be interested to hear why it doesn’t work, so I can keep that in mind when recommending solutions in the future.

Last time I tried Prometheus was years ago. So I don't know how much might have changed... I gave it a good month or two of effort trying to get the stack to do what I needed and never really succeeded.

Just my opinion, but I honestly don't think the scraping model makes much sense. It requires you expose extra ports and paths on your servers that the push model doesn't require. I'm not a fan of the extra effort required to keep those ports and paths secure.

Beyond that, promql is an extra learning curve that I didn't like. I still ran into disk space issues when I used a proper data backend (TimescaleDB). Configuring all the scrapers was overly complicated. Making sure to deploy all the collectors and the needed configuration was rather complicated.

In comparison, deploying Filebeat and Metricbeat is super simple, just configure the yaml file via something like Ansible and you're done. Elastic Agent is annoying in that you can't do that when using Fleet, or at least I have yet to figure out how to automate it. But it's still way easier than the Prometheus stack.

I've tried to get Loki to work 2 or 3 times. Never have really succeeded. I think I was able to browse a few log lines during one attempt, I don't think I even got that far in the other attempts... The impression I came away with was that it was designed to be run by people with lots of experience with it. Either that, or it just wasn't actually ready to be used by anyone not actively developing it.

So, yeah, while I figure a lot of people do well with the Prometheus/Grafana/Loki stack, it just isn't for me.

The most basic setup, and the one typically used until you need something more advanced, is using Prometheus for scraping and as the TSDB backend. If you ever decide to revisit prometheus, you’ll likely have better luck starting with this approach, rather than implementing your own scraping or involving TimescaleDB at all (at least until you have a working monitoring stack).

There used to be a connector called Promscale that was for sending metrics data from Prometheus to Timescale (using Prometheus’ remote_write) but it was deprecated earlier this year.

Also important to add: using prometheus as the tsdb is good for short term use (on the order of days to months). For longer retention you could offload it elsewhere, like another Prometheus-based backend or something else SQL-based, etc
hey - I work on ML at Netdata (disclaimer).

We have a big PR open and under review at moment that brings in a lot more logs capabilities: https://github.com/netdata/netdata/pull/13291

We also have some specific logs collectors too - i think in here might be best place to look around at the moment, should take you to the logs part of the integrations section in our demo space (no login needed, sorry for the long horrible url, we adding this section to our docs soon but at moment only lives in the app)

https://app.netdata.cloud/spaces/netdata-demo/rooms/all-node...

Nice to see that the log analysis is being worked on.

I'll see if I can figure out the integrations you pointed out. They look more like they are aimed at monitoring the metrics of the tools, not using the tools to aggregate logs. Right?

The way most ops systems treat logs and metrics as completely separate areas has always struck me as odd. Both are related to each other, and having them in the same system should be default. That's why I've put as much effort into the ELK Stack as I have. They've seemed to be the only ones who have really grasped that idea. (Though it's been a year or two since I've really surveyed the space...)

One question not log related, is it required to sign up for a cloud account to get multiple nodes displaying in the same screen? From the docs on streaming, I think you can configure nodes to send data to a parent node without a cloud account, but I either haven't configured it properly yet, or something else is in the way, since the node I'm trying to set up as a parent isn't showing anything from the child node.

FYI, you need to add the api-key config section to the stream.conf file on the parent node in order to enable the api key and allow child nodes to send data to the parent node. I thought it went into the netdata.conf file... I also kinda wonder why it matters what file has what config since the different config sections all have section headings like `[stream]` or `[web]`.

So, the answer to my question is that you can get multiple nodes showing up without a cloud account. Just have to configure it correctly.

I have used https://github.com/openobserve/openobserve in several hobby projects and liked it. It's an all-in-one solution. It's likely less featureful than many others but a single binary and everything in one place pulled me in and worked for me so far.

Not affiliated, I just like the tool.

I'm not sure if the version in use at $workplace is out of date or incorrectly configured, but it is a dreadful prometheus client in that it doesn't use labels, it just shovels all the metadata into the metric name like a 1935 style graphite install, making most of the typical prometheus goodness impossible to use.

The little dashboard thing is nice, though.

From my experience, no silver bullets. Let metric software do metric and log software do logs.

At the very least at the database level. Maybe we will get visualisation engine that merges both nicely but database wise the type of data couldn't be any different.