| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by deafpolygon 1245 days ago

> but the second version used what I termed "Artificial Ignorance" - a process whereby you throw away the log entries you know aren't interesting. If there's anything left after you've thrown away the stuff you know isn't interesting, then the leftovers must be interesting. This approach worked amazingly well, and detected a number of very interesting operational conditions and errors that it simply never would have occurred to me to look for.

As a sysadmin, I took this approach as well. On the local machine, the server(s) would log normally. But, when I set-up centralized logging, I set-up a list of log entries that wouldn't normally interest me day-to-day. The server would only send to a central logging server things that weren't on this list. What was left were usually problems that I would need to pay attention to and they got fixed faster.

The rest of the uninteresting log entries would just be audited from time to time.

On the matter of security, every user that logs in on a daily basis gets logged with their IP address. Anytime that a user logged in with a different IP - it would get logged to the central log server and I would be notified. Most of the time, it was harmless.. but there were enough times I would find a compromised account in a sea of normal day-to-day login activity.

When your logs are full of normal things in it, it's easy to miss important details.

1 comments

somat 1245 days ago

I have the idea of doing spam detection style bayesien analysis on logs. the theory being you feed it your log stream, those are your normal logs, if the log stream start deviating from normal the statistical analysis starts to pop warnings. if it deviants for too long that would become the new normal.

At this point I am elbow deep in bayesien email code trying to work out the nuts and bolts of the operation. One important trick is that you need a location aware hash to feed into your statistics engine. A better hash would utilize the structure of log lines, but categorizing logs is big messy yak shaving sort of work. Perhaps a worse more generic hash would be good enough.

link

deafpolygon 1245 days ago

Or a list of regex strings?

link