| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by B0073D 1188 days ago

Recently I wanted to try and use this to filter out logs I don't care about, but it seemed a lot more involved than I initially thought.

I essentially wanted to use this as a way to flexibly filter out items without having to come up with a regex for every line item.

I wonder if anyone has done this before...

4 comments

zamadatix 1188 days ago

Could always rely on the Levenshtein distance. You have to be careful with similarity approaches though as you may end up filtering important messages because they are structurally similar to the unimportant message.

link

voldacar 1187 days ago

Maybe you could use a language model embedding to define some kind of semantic distance.

link

funkylisp 1187 days ago

Just filter out logs by the file, line that generated it (i.e. that had the log statement). Even if the actual log entry changes (e.g. because of a formatted str with vars) they will always have the same source.

link

ta988 1188 days ago

Bayesian filters like for emails? You mark them as important or noise and over time it will learn. These are extremely easy to put in place and you don't have to preannotate as it learns as you go.

link

nl 1187 days ago

Yes Bayesian filters work well for this.

I had an idea Splunk had them built in? But it's about 5 lines of Python anyway.

link

dopidopHN 1188 days ago

I often feel modern tools should offer that

link