| There are many log alerting systems on the market. The best known is probably Datadog. There’s also Logtail, Papertrail, Splunk, Logstash and others. These are well put together products with a host of great features, such as excellent UIs, sophisticated live searching via web interfaces and sometimes query languages and alerting. They require various levels of installation and they have costs, either through volume-based tiered systems or monthly payments. For a bootstrapped business, this can be problematic, for instance when a surge of logs - indicating a possible important problem that needs to be solved - pushes volume on to another tier. Should the “log ransom” be paid? Instead, I recalled from earlier times surely the simplest log watcher: Swatchdog [1]. It is rather venerable software. Its file history from its source download shows dates in 2015, but it was written much earlier - the 90s or possibly 80s by Todd Atkins [2]. We wanted to have alerts in Slack - the blog explains how we did it. In short: *very simply*. The code is available [3]. [1]: https://github.com/ToddAtkins/swatchdog [2]: https://www.linkedin.com/in/toddatkins/ [3]: https://github.com/profitviews/swatchdog |
I like the opposite method - default alert on everything and develop an allowlist that quietens things down you don't want to hear. This is great for alerting you to unexpected things. And once in a while you actually want to know about some of those things :-)
It may sound very noisy but it's not too bad, especially once you're allowlist is setup. Logcheck[0] is a good tool for this and it runs by default at 2 minutes past each hour, emailing in a report of everything that isn't allowed. I think it matches some regex to what it deems higher threat events and those are always alerted on.
I'll conceed that this method isn't stellar for cattle! And we don't bother with it for things like kubernetes clusters or servers with semi-regular turnover for instance.
For pets and long lived servers that need looking after it's a good tool.
[0]https://logcheck.org/