| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ktamura 3730 days ago

>To respond to your point about absence of Grok like facility, avoiding the need to unmarshal and remarshal the data while passing through LogZoom was an explicit design requirement. The blogpost refers to our pain with Logstash/Fluentd etc.

I think there are two different (CPU) performance problems conflated into one:

(1) The cost of parsing logs with something like Grok and Regexp

(2) The cost of marshaling and unmarshaling data

While both do cost CPU time, based on my experience having talked to literally hundreds of Fluentd users (I'm a maintainer and was a core support member for awhile), the cost of (1) dwarfs the cost of (2). (2) is pretty cheap if you use efficient serializers like MessagePack. As for (1), both Logstash and Fluentd support an option to perform zero parsing (In Fluentd, it's "format none"). By using these options, you can bring down CPU time significantly.

All of this being said, it looks like LogZoom isn't a true competitor to Fluentd or Logstash or Heka. It made different performance/functionality trade-offs and by doing less, it saves more CPU time: If you forgo the option of parsing logs at source (and in Logstash and Fluentd's defense, they do a whole lot more), you obviously can save resources. On the flip side, you need to post-process your logs to make them useful, and some other servers downstream will pay for CPU (You might not care about this because your logs have been thrown over the fense and now it's data engineers's job =p)