| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spenczar5 562 days ago
	But… this does drop data? Only the start and end timestamp are preserved; the middle ones have no time. How can this be called lossless? Genuinely lossless compression algorithms like gzip work pretty well.

2 comments

corytheboyd 562 days ago

Exactly my thoughts, the order of these events by timestamp is itself necessary for debugging.

If I want something like per-transaction rollup of events into one log message, I build it and use it explicitly.

link

efitz 562 days ago

Was going to point out the same thing - the original article's solution is losing timestamps and possibly ordering. They also are losing some compressibility by converting to a structured format (JSON). And if they actually include a lot of UUIDs (their diagram is vague on what transaction IDs look like), then good luck - those don't compress very well.

I worked at a magnificent 7 company that compressed a lot of logs; we found that zstd actually did the best all-around job back in 2021 after a lot of testing.

link

greggyb 562 days ago

We have a process monitor that basically polls ps output and writes it to JSON. We see ~30:1 compression using zstd on a ZFS dataset that stores these logs.

I laugh every time I see it.

link

eru 562 days ago

Agreed.

If you used something like sequential IDs (even in some UUID format) it can compress pretty well.

link

willvarfar 562 days ago

As a member of the UUIDv7 cheering squad let me say 'rah rah'! :D

link

pdimitar 562 days ago

Which compression level of zstd worked best in terms of the ideal balance between compression ratio vs. run time?

link