Hacker News new | ask | show | jobs
by efitz 562 days ago
Was going to point out the same thing - the original article's solution is losing timestamps and possibly ordering. They also are losing some compressibility by converting to a structured format (JSON). And if they actually include a lot of UUIDs (their diagram is vague on what transaction IDs look like), then good luck - those don't compress very well.

I worked at a magnificent 7 company that compressed a lot of logs; we found that zstd actually did the best all-around job back in 2021 after a lot of testing.

3 comments

We have a process monitor that basically polls ps output and writes it to JSON. We see ~30:1 compression using zstd on a ZFS dataset that stores these logs.

I laugh every time I see it.

Agreed.

If you used something like sequential IDs (even in some UUID format) it can compress pretty well.

As a member of the UUIDv7 cheering squad let me say 'rah rah'! :D
Which compression level of zstd worked best in terms of the ideal balance between compression ratio vs. run time?