Hacker News new | ask | show | jobs
by makapuf 3730 days ago
What about simple rsyslog ? I stumble on this kind of programs (others have mentioned heka, fluentd, logstash), but the general speed, simplicity, versatility -the feature range is actually quite big from ES output to unix pipes to simple filters - and ubiquity of rsyslog make it suited for many of these tasks. I am missing something ?
7 comments

Don't forget that rsyslog can also parse and generate structured data (json with mmjsonparse for input + templates with json escaping for output).

It can also queue up messages in memory and/or to disk if your remote data sink is having a hiccup.

And for those wondering how to send multiline data, well you don't. If you need to write out a big blurb, you write it out on a single line from the application. If using structured data, you can output lines as separate array items. The current built-in limit for a syslog line is 8096 bytes, but that's tunable. Just make sure the thing that writes to syslog doesn't have a low hardcoded limit like older versions of logger from util-linux (1024 bytes)

edit: the version of rsyslog shipped with the distros might be a bit dated. They're providing packages for their latest stable version, we're using that and it works pretty well.

Have fun getting multi-line stack traces through rsyslog unmangled.
Well for starters I'd rather have a structured log entry with the ability of context instead of a single line having to grep for information. With context in the logs it allows easier indexing and searching, I can add more data to the log entry knowing I'm not making it hard to grep but easier.
It's easy to get a stream of messages out of Fluentd in raw(ish?) form or to write a message destination plugin for it. This makes Fluentd an excellent message forwarder for generic data. On rsyslog side, you can't get a line-wise stream of JSON messages passed to TCP or UNIX socket or through a pipe to a command, and writing a plugin for it takes some C code.

I wouldn't build a monitoring or inventory system on rsyslog, but I don't hesitate to use Fluentd. rsyslog was intended for logs only, and using it in any other way seems an abuse, even if smart and somewhat fitting.

I haven't used logstash, but I bet it operates in a similar way on its data sink border.

We would switch away from Rsyslog in a heartbeat if someone could come up with a better syslog-compatible forwarder.

We have it set up to write logs locally (with a limited rotation) as well as forward them via TLS to a central Rsyslog server that collects the log in a single tree with a much longer retention time. (We don't use any of the non-file outputs, but we do sync to S3 for archival.)

It has major issues. For one, its spooling implementation is flaky. /dev/log is a limited, synchronous-blocking FIFO buffer, which means that everything that logs (including OpenSSH!) will choke if the buffer is full. For some reason, just a tiny bit of packet loss will throw Rsyslog.

It also frequently is unable to recover from a network blip, and a restart is the only solution. But its spool file is badly implemented, so on restart it will typically ignore the old spool files and start anew — meaning you lose data. Someone wrote a Perl script to fix a broken spool directory, but I never got it to work.

Ironically, Rsyslog is also terrible at logging what it's unhappy about at any given time, so whenever something bad happens, you probably won't get anything in the system log.

Rsyslog's configuration is a curious beast, and by curious I mean infuriating. Rsyslog originally had an antiquated, ad-hoc and messy line-oriented configuration file format (with directives like "$RepeatedMsgReduction off"), and author decided to transition to a more modern, block/brace-based syntax. Unfortunately, he decided to do this gradually, and both syntaxes can co-exist in the same file. For a while, many of the options were only available in the old syntax, so you had to mix the two.

Which leads me to the next problem: The documentation is absolutely atrocious. The Rsyslog site is a fragmented mess of mostly outdated information. It's gotten better with v8, but it's still the worst OSS project documentation I've encountered. There's no reference section that lists the possible config options. Frequently there is no documentation for a particular setting. Rsyslog is quite finicky about some combinations of options (like TLS driver configs) and you have to proceed by trial and error. Frustratingly, it will silently ignore some config errors (such as trying to set up multiple TCP listeners, which is still not supported).

The new config format is better, but it still has the feel of something that has been implemented before it was fully designed.

Again, we don't use any of the fancy output modules. Maybe they are solid, but based on my experiences with the simple file-based stuff, I wouldn't bet on it.

As an aside, it's worth pointing out that Rsyslog is still using the Syslog protocol, which has all sorts of issues (not consistently implemented by clients or servers; does not support multi-line messages). Rsyslog has another protocol, RELP, that I believe you can use for forwarding, but I don't think it's been implemented outside of Rsyslog.

As far as I can tell, there aren't any good alternatives. syslog-ng's forwarding support is commercial and quite expensive. Logstash might work, but I don't want to run a memory-hungry Java app on each box.

Not sure what you mean by forwarding support - my entire environment is configured with syslog-ng forwarding to various places based on various rules.

Heck of a lot faster than rsyslog. The one thing I've not been able to do is get rsyslog forwarding to syslog-ng. Something happens to the message format between systems that leads to hilariously incorrect filenames on the collector systems.

By forwarding I mean reliable, disk-buffered forwarding. This only exists in the commercial "Premium Edition" of syslog-ng.
There is disk based buffering in NXLog CE. You might want to check it out with respect to the other woes you have with rsyslog.
Never heard of that one, thanks.
... which I highly recommend until you outgrow it and graduate to Kafka.
On the other hand, it's pricy and last I checked, licensing is based on the number of machines (ridiculous in a cloud environment).
logstash and rsyslog do different things. there is some overlap.
It isn't new and shiny and its available from distro packages, so it's not worth the attention of the cool kids.

If you don't need to curl|sh from a .io (or .sh) domain to install it, it's not worth using apparently.

Just for information, I generally agree with what you said here, except in this particular case rsyslog is not a generic data bus, and Fluentd and logstash are, so they're useful on their own merit. They're just often used as merely log transports, which overlaps with rsyslog.