Hacker News new | ask | show | jobs
by justanotheratom 410 days ago
Great article and explanation.

On a practical level though, this would be the last thing I would use for log collection. I understand that when there is a spike, something has to be dropped. What should this something be?

I don't see the point of being "fair" about what is dropped.

I would use fairness as a last resort, after trying other things:

Drop lower priority logs: If your log messages have levels (debug, info, warning, error), prioritize higher-severity events, discarding the verbose/debug ones first.

Contextual grouping: Treat a sequence of logs as parts of an activity. For a successful activity, maybe record only the start and end events (or key state changes) and leave out repetitive in-between logs.

Aggregation and summarization: Instead of storing every log line during a spike, aggregate similar or redundant messages into a summarized entry. This not only reduces volume but also highlights trends.

3 comments

I’ve been down the observability rabbit hole recently, and what you’re describing is probably a mix of head and tail sampling: https://docs.honeycomb.io/manage-data-volume/sample/
honeycomb seems quite mature, thanks.
The article addressed this. In fact, you don't typically want to throw away all of the low priority logs ... you just want to limit them to a budget. And you want to limit the total number of log lines collected to a super budget.

Reservoir sampling can handle all of that.

You should drop or consolidate some entries if you can, but then the important entries that remain can still be too many and require random culling because anything is better than choking.

Fair reservoir sampling can be made unfair in controlled ways (e.g. by increasing the probability of retaining an entry if its content is particularly interesting); it competes with less principled biased random (or less than random) selection algorithms as a technique of last resort.