| A few things I have learnt along the way: Logs are great, but only once you've identified the problem. If you are searching through logs to _find_ a problem, its far too late. Processing/streaming logs to get metrics is a terrible waste of time, energy and money. Spend that producing high quality metrics directly from the apps you are looking after/writing/decomming (example: dont use access logs to collect 4xx/5xx and make a graph, collate and push the metrics directly) Raw metrics are pretty useless. They need to be manipulated into buisness goals: service x is producing 3% 5xx errors vs % of visitors unable to perform action x Alerts must be actionable. Alerts rules must be based on sensible clear cut rules: service x's response time is breeching its SLA not service x's response time is double its average for this time in may. |
Yeah nah, but, okay, nah yeah.
Generating metrics in the app is much more intrusive, and requires that you figure out the metrics you need ahead of time. It adds dependencies, sockets, and threads to your app.
Unless you're very careful, it's also easy to end up double-aggregating, computing medians of medians and other meaningless pseudo-statistics - if you're using the Dropwizard Metrics library, for example, you've already lost.
If you output structured log events, where everything is JSON or whatever and there are common schema elements, you can easily pull out the metrics you need, configure new ones on the fly, and retrospectively calculate them if you keep a window of log history.
When i've worked on systems with both pre- and post-calculated metrics, the post-calculated metrics were vastly more useful.
The huge, virtually showstopping, caveat here is that there is lots of decent, easy-to-use tooling for pre-calculated metrics, and next to nothing for post-calculated metrics. You can drop in some libraries and stand up a couple of servers and have traditional metrics going in a day, with time for a few games of table tennis. You need to build and bodge a terrifying pile of stuff to get post-calculated metrics going.
Anyway if there's a VC reading this with twenty million quid burning a hole in their pocket who isn't fussy about investing in companies with absolutely no path to profitability, let me know, and i'll do a startup to fix all this. I'll even put the metrics on the blockchain for you, guaranteed street cred.