| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by latch 3730 days ago

Operationally speaking the single most important thing you should be doing is collecting application and system logs and having them easily accessible and usable (and check your backups every now and again). I say this with respect to the value you gain in comparison to the relatively small costs. You're being your own worst enemy if you aren't staying on top of error logs.

The OSS solutions are mature and simple to setup. And it isn't something you need to get absolutely correct with 100% uptime. If you're an "average" company, a single server running Logstash+ES+Kibana is probably good enough. There's only two ways you can do this wrong: not doing it at all, or forwarding non-actionable items (which creates a low signal-to-noise and people will just ignore it).

After that comes metrics (system and application), which is important, but not as trivial to setup.

Quickly looking at LogZoom, I think the more forwarding options we have, the better. They make it very clear that, unlike Logstash, this doesn't help structure data. On one hand, I don't think that's a big deal. Again, if you're only writing out actionable items, and if you're staying on top of it, almost anything that moves data from your app/servers onto ES+Kibana (or whatever) is going to be good enough.

On the flip side, adding structure to the logs can help as you grow. Grouping/filtering by servers, locations, types (app vs system), versions...is pertty important. I like LogStash, I actually think it's fun to configure (the grok patterns) and it helps you reason about what you're logging and what you're hoping to get from those logs.

1 comments

chetanahuja 3730 days ago

PacketZoom founder here. Glad you liked the project. Could not agree more with the importance of tracking logs (and metrics... but that's a topic for another post).

To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc. We were in a situation where our production code was fighting for resources against a log collecting facility.

In general, it's best to process the data (to the extent possible) closest to it's point of origin. It's orders of magnitude cheaper to create a well structured log line straight from your production code (where it's just some in-memory manipulation of freshly created strings) rather than in a post-processing step inside a separate process (or machine).

I've spent years dealing with performance problems in global scale production stacks and a surprisingly high number of resource bottlenecks (memory/CPU/Disk IO) etc. are caused by ignoring this simple principle.

I've lost count of the cases where a simple restructuring of the architecture to avoid a marshal/unmarshal step drastically cuts down resource requirement and operational headaches. Unfortunately a whole lot of industry "best practices" (exemplified by the Grok step in Logstash) encourage the opposite behavior.

link

ktamura 3730 days ago

>To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc.

I think there are two different (CPU) performance problems conflated into one:

(1) The cost of parsing logs with something like Grok and Regexp

(2) The cost of marshaling and unmarshaling data

While both do cost CPU time, based on my experience having talked to literally hundreds of Fluentd users (I'm a maintainer and was a core support member for awhile), the cost of (1) dwarfs the cost of (2). (2) is pretty cheap if you use efficient serializers like MessagePack. As for (1), both Logstash and Fluentd support an option to perform zero parsing (In Fluentd, it's "format none"). By using these options, you can bring down CPU time significantly.

All of this being said, it looks like LogZoom isn't a true competitor to Fluentd or Logstash or Heka. It made different performance/functionality trade-offs and by doing less, it saves more CPU time: If you forgo the option of parsing logs at source (and in Logstash and Fluentd's defense, they do a whole lot more), you obviously can save resources. On the flip side, you need to post-process your logs to make them useful, and some other servers downstream will pay for CPU (You might not care about this because your logs have been thrown over the fense and now it's data engineers's job =p)

link

jsmeaton 3730 days ago

I think you make a good point that logs should be transformed closer to the source. I work, primarily, with applications provided by a vendor, with very unstructured log data. Transforming (Grok) these logs is an absolute must, we couldn't look at something that didn't allow transformation. That said, maybe we should be looking at something closer to the source before handing it off to a central location. Are you aware of agent-like daemons that do transformation before handoff?

link

seanp2k2 3730 days ago

Structured logs are awesome and a great idea. For the next few decades while standards come and go and everyone gets it all implemented across the board, yes it sucks to write grok patterns for the flavor of the week, but once you do it a few times, it takes maybe a few hours of work to get some app cluster with moderately logging flowing into ES with all the right types and all the edge cases accounted for. From there, ELK is such a Swiss Army knife that it's worth the trouble, since then it's e.g. trivial to fire PagerDuty alerts off if you hit some exception-level log lines, or post metrics about your logs, or put them on some queue to flow into some big data pipeline thing.

link

b0ti 3730 days ago

You might want to consider NXLog if you need to do transformation at the source. For us this was an explicit design goal. Moreover it is also lightweight and a lot of people use it in place of other fat and bulky solutions, quite popular with ELK users.

link

nathwill 3730 days ago

We use heka and love it

link