Hacker News new | ask | show | jobs
by shrubble 1362 days ago
This is basically sysadmin 101, however.

Compressing logs has been a thing since the mid-1990s.

Minimizing writes to disk, or setting up a way to coalesce the writes, has also been around for as long as we have had disk drives. If you don't have enough RAM on your system to buffer the writes so that more of the writes get turned into sequential writes, your disk performance will suffer - this too has been known since the 1990s.

1 comments

Sysadmin 101 doesn't involve separating the dynamic portions of similar, but unstructured log lines to dramatically improve compression and search performance.

> Zstandard or Gzip do not allow gaps in the repetitive pattern; therefore when a log type is interleaved by variable values, they can only identify the multiple substrings of the log type as repetitive.

A sysadmin would use the logging facility (if traditional syslog) or simply awk/sed to process the logs into different files that are similar to each other (such as different levels of INFO/WARN/ERROR); then, increase the size of the DEFLATE dictionary used for compression until you get better compression.

See for instance this discussion of creating a custom DEFLATE dictionary: https://blog.cloudflare.com/improving-compression-with-prese...