Hacker News new | ask | show | jobs
by sa46 1363 days ago
Sysadmin 101 doesn't involve separating the dynamic portions of similar, but unstructured log lines to dramatically improve compression and search performance.

> Zstandard or Gzip do not allow gaps in the repetitive pattern; therefore when a log type is interleaved by variable values, they can only identify the multiple substrings of the log type as repetitive.

1 comments

A sysadmin would use the logging facility (if traditional syslog) or simply awk/sed to process the logs into different files that are similar to each other (such as different levels of INFO/WARN/ERROR); then, increase the size of the DEFLATE dictionary used for compression until you get better compression.

See for instance this discussion of creating a custom DEFLATE dictionary: https://blog.cloudflare.com/improving-compression-with-prese...