Hacker News new | ask | show | jobs
by geographomics 4055 days ago
The ease of recovering data from a corrupted log file depends on whether the logged events have been written as sequential records. This is true for text-based logs (the record delimited being a newline), and is also true of the most popular binary (i.e. structured) log formats, namely Windows event logs, and systemd's journals. Probably not if you're storing them in a more general purpose database though.

So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.

1 comments

But if you're not storing them in a database then your primary advantage of using a binary format (namely performance) evaporates.
The difference is that a general purpose database typically organises data by fixed-size pages, so new data could be anywhere in the file as there is no guarantee of page ordering with regard to inserts. Whereas a specialised file format for logging would add new records at the end of the file (or in a circular fashion, depending on the design). But will have features similar to a database like a defined schema, and some form of indexing. This is true of systemd journals and Windows event logs anyway.
You can do the same kind of indexing with text files too, eg you see this with dictionary and thesaurus databases.

Thus if you're going to sacrifice the "read anyway" ability of a log file then you really need to go for a fully optimised database to really take advantage of a binary format - rather than this half-and-half approach that has none of the real benefits of either but all of the same drawbacks of both.