Hacker News new | ask | show | jobs
by spenczar5 562 days ago
But… this does drop data? Only the start and end timestamp are preserved; the middle ones have no time. How can this be called lossless?

Genuinely lossless compression algorithms like gzip work pretty well.

2 comments

Exactly my thoughts, the order of these events by timestamp is itself necessary for debugging.

If I want something like per-transaction rollup of events into one log message, I build it and use it explicitly.

Was going to point out the same thing - the original article's solution is losing timestamps and possibly ordering. They also are losing some compressibility by converting to a structured format (JSON). And if they actually include a lot of UUIDs (their diagram is vague on what transaction IDs look like), then good luck - those don't compress very well.

I worked at a magnificent 7 company that compressed a lot of logs; we found that zstd actually did the best all-around job back in 2021 after a lot of testing.

We have a process monitor that basically polls ps output and writes it to JSON. We see ~30:1 compression using zstd on a ZFS dataset that stores these logs.

I laugh every time I see it.

Agreed.

If you used something like sequential IDs (even in some UUID format) it can compress pretty well.

As a member of the UUIDv7 cheering squad let me say 'rah rah'! :D
Which compression level of zstd worked best in terms of the ideal balance between compression ratio vs. run time?