| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dozzie 3411 days ago

> It wasn't an "we do this at scale" talk, but I'd love to see more experiments like it.

Well, I will be conducting such thing in near future. From ELK stack, I never used Logstash in the first place and used Fluentd instead (and now I'm using a mixture of my own data forwarder and Fluentd as a hub). I'm planning mainly to replace Elasticsearch, and probably will settle with a command line client for reading, searching, and analyzing (I dislike writing web UIs).

All this because I'm tired of elastic.co. I can't upgrade my Elasticsearch 1.7.5 to the newest version, because then I would need to upgrade this small(ish) 4MB Kibana 3.x to a monstrosity that weihgs more than whole Elasticsearch engine itself for no good reason at all. And now that I'm stuck with ES 1.x, it's only somewhat stable; it can hang up for no apparent reasonat unpredictable intervals, sometimes three times per week, and sometimes working with no problem for two months. And to add an insult to an injury, processing logs with grep and awk (because I store the logs in flat files as well as in ES) is often faster than letting ES do the job. I only keep ES around because Kibana gives nice search interface and ES provides a declarative query language, which is easier to use than building awk program.

> He even goes as far as implementing a minimal logstash equivalent (i.e. log parsing) into the database itself.

As for parsing logs, I would stay away from database. Logs should be parsed earlier and available for machine processing as a stream of structured messages. I have implemented such thing using Rainer Gerhards' liblognorm and I'm very happy with the results, to the point that I derive some monitoring metrics and was collecting inventory from logs.

3 comments

awj 3411 days ago

> I can't upgrade my Elasticsearch 1.7.5 to the newest version, because then I would need to upgrade this small(ish) 4MB Kibana 3.x to a monstrosity that weihgs more than whole Elasticsearch engine

...is that really a good reason to reinvent this whole solution, though? You're basically saying you're going to spend the time to replace your entire log storage/analysis system because you object to the disk size of Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).

The rest of your complaints seem to stem from not having upgraded elasticsearch, aside from possibly hitting query scenarios that continue to be slower-than-grep after the upgrade.

Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.

link

dozzie 3411 days ago

> ...is that really a good reason to reinvent this whole solution, though?

The system being dependency-heavy and pulling an operationally awful stack (Node)? Yes, this alone is enough of a reason for me. And I haven't mentioned yet other important reasons, like memory requirements and processing speed (less than satisfactory), elasticity of processing (ES is mostly query-based tool, and whatever pre-defined aggregations it has, it's too constrained paradigm for processing streams of logs), and me wanting to take a shot at log storage, because our industry actually doesn't have any open source alternative to Elasticsearch.

> Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).

Close, but missed. It's 130MB unpacked.

> Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.

I'm fed up with the outlook of the whole thing. Here ridiculous disk space for what the thing does, there slower-than-grep search speed, another place that barely keeps up with the rate I'm throwing data at it (single ES instance should not loose its breath under just hundreds of megabytes per day), upgrade that didn't make things faster or less memory-consuming, but failed to accept my data stream (I was ready to patch Kibana 3.x for ES 5.x, but then I got bitten twice in surprising, undocumented ways and gave up, because I lost my trust that it won't bite me again).

Sorry, but no, I don't see Elasticsearch as a state-of-the-art product. I would gladly see some competition for log storage, but all our industry has now is SaaS or paid software. I'm unhappy with this setting and that's why I want to write my own tool.

link

einhverfr 3411 days ago

The problem you run into is "we need some more information that is in the logs but we didn't thin to parse before." Here PL/Perl is awesome because you can write a function, index the output, and then query against the function output.

One reason I always store full source data in the db.

link

dozzie 3411 days ago

> The problem you run into is "we need some more information that is in the logs but we didn't thin to parse before."

Agreed, though with liblognorm rules you just shove every single variable field into JSON field and that mostly does the job. And in the case you were talking about logs with no matching rules, liblognorm reports all unparsed logs, and my logdevourer sends them along the properly parsed logs, so no data is actually omitted.

link

renesd 3411 days ago

Thanks for the tip about liblognorm. Looks quite useful!

link

dozzie 3411 days ago

Oh yes it is. The rules syntax is nice and is a big improvement over regexps that are popular with almost every other log parser out there, but the best thing is that if your rules fail, liblognorm reports precisely what part of the log could not be consumed, not just the fact that none of the rules matched.

Liblognorm has only one major user: rsyslog, for which it was written, but at some point I thought that it would be nice to have a separate daemon that only parses logs, so I wrote logdevourer (https://github.com/korbank/logdevourer).

link