Hacker News new | ask | show | jobs
by B0073D 1188 days ago
Recently I wanted to try and use this to filter out logs I don't care about, but it seemed a lot more involved than I initially thought.

I essentially wanted to use this as a way to flexibly filter out items without having to come up with a regex for every line item.

I wonder if anyone has done this before...

4 comments

Could always rely on the Levenshtein distance. You have to be careful with similarity approaches though as you may end up filtering important messages because they are structurally similar to the unimportant message.
Maybe you could use a language model embedding to define some kind of semantic distance.
Just filter out logs by the file, line that generated it (i.e. that had the log statement). Even if the actual log entry changes (e.g. because of a formatted str with vars) they will always have the same source.
Bayesian filters like for emails? You mark them as important or noise and over time it will learn. These are extremely easy to put in place and you don't have to preannotate as it learns as you go.
Yes Bayesian filters work well for this.

I had an idea Splunk had them built in? But it's about 5 lines of Python anyway.

I often feel modern tools should offer that