Hacker News new | ask | show | jobs
by laumars 4055 days ago
Oh jeez. Yes there are better and more performant tools for parsing optimised binary databases; nobody disputes that. And yes, tools like Splunk are more user friendly than grep; nobody disputes that either. But to advocate a binary only system for logs is short sighted because logs are the goto when everything else fails and thus need to be readable when every other tool dies. There's quite a few scenarios that could cause this too:

  * log file corruption - text parsing would still work,

  * tooling gets deleted - there's a million ways you
    can still render plain text even when you've lost
    half your POSIX/GNU userland,

  * network connection problems, breaking push to a
    centralised database - local text copies would still
    be readable.
  
In his previous blog post he commented that there's no point running both a local text version and a binary version, but since the entirety of his rant is really about tooling rather than log file format, I'm yet to see a convincing argument against running the two paradigms in parallel.
2 comments

The ease of recovering data from a corrupted log file depends on whether the logged events have been written as sequential records. This is true for text-based logs (the record delimited being a newline), and is also true of the most popular binary (i.e. structured) log formats, namely Windows event logs, and systemd's journals. Probably not if you're storing them in a more general purpose database though.

So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.

But if you're not storing them in a database then your primary advantage of using a binary format (namely performance) evaporates.
The difference is that a general purpose database typically organises data by fixed-size pages, so new data could be anywhere in the file as there is no guarantee of page ordering with regard to inserts. Whereas a specialised file format for logging would add new records at the end of the file (or in a circular fashion, depending on the design). But will have features similar to a database like a defined schema, and some form of indexing. This is true of systemd journals and Windows event logs anyway.
You can do the same kind of indexing with text files too, eg you see this with dictionary and thesaurus databases.

Thus if you're going to sacrifice the "read anyway" ability of a log file then you really need to go for a fully optimised database to really take advantage of a binary format - rather than this half-and-half approach that has none of the real benefits of either but all of the same drawbacks of both.

If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.
> If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.

You do have a bigger concern, but once that needs to be addressed by consulting the log files.

I fully accept that most of the situations I exampled are rare fringe cases, but log files are the go to when all else fails and thus there needs to be a copy that's readable if and when everything else does fail.

'tooling gets deleted' could easily happen after changing logging systems... while it would be shortsighted to uninstall your old logging system entirely (if you have logs laying around in that format) it's not unheard of.

The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

> The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

So expose the shared storage to a system running any current mainstream Linux distribution. I understand what you're saying, but this still doesn't seems like a huge concern.

... We were talking about logging systems with proprietary tools for manipulating logs. Ergo, 'any current mainstream Linux distribution' wouldn't have them installed by default.