|
Operationally speaking the single most important thing you should be doing is collecting application and system logs and having them easily accessible and usable (and check your backups every now and again). I say this with respect to the value you gain in comparison to the relatively small costs. You're being your own worst enemy if you aren't staying on top of error logs. The OSS solutions are mature and simple to setup. And it isn't something you need to get absolutely correct with 100% uptime. If you're an "average" company, a single server running Logstash+ES+Kibana is probably good enough. There's only two ways you can do this wrong: not doing it at all, or forwarding non-actionable items (which creates a low signal-to-noise and people will just ignore it). After that comes metrics (system and application), which is important, but not as trivial to setup. Quickly looking at LogZoom, I think the more forwarding options we have, the better. They make it very clear that, unlike Logstash, this doesn't help structure data. On one hand, I don't think that's a big deal. Again, if you're only writing out actionable items, and if you're staying on top of it, almost anything that moves data from your app/servers onto ES+Kibana (or whatever) is going to be good enough. On the flip side, adding structure to the logs can help as you grow. Grouping/filtering by servers, locations, types (app vs system), versions...is pertty important. I like LogStash, I actually think it's fun to configure (the grok patterns) and it helps you reason about what you're logging and what you're hoping to get from those logs. |
To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc. We were in a situation where our production code was fighting for resources against a log collecting facility.
In general, it's best to process the data (to the extent possible) closest to it's point of origin. It's orders of magnitude cheaper to create a well structured log line straight from your production code (where it's just some in-memory manipulation of freshly created strings) rather than in a post-processing step inside a separate process (or machine).
I've spent years dealing with performance problems in global scale production stacks and a surprisingly high number of resource bottlenecks (memory/CPU/Disk IO) etc. are caused by ignoring this simple principle.
I've lost count of the cases where a simple restructuring of the architecture to avoid a marshal/unmarshal step drastically cuts down resource requirement and operational headaches. Unfortunately a whole lot of industry "best practices" (exemplified by the Grok step in Logstash) encourage the opposite behavior.