Hacker News new | ask | show | jobs
by andrewvc 3730 days ago
Hi all, Logstash developer here. It's always exciting to see new stuff in this space, however, this post has me confused. Maybe the OP can clue me in.

I'm a bit confused as the assertion "This worked for a while, but when we wanted to make our pipeline more fault-tolerant, Logstash required us to run multiple processes.", is no more true for Logstash than it is for any other piece of software. Single processes can fail, so it can be nice to run multiples. It would be great if the author of the piece had clarified that further. If you're around I'd love to hear specifically what you mean by this. Internally Logstash is very thread friendly, we only recommend multiple processes when you want either greater isolation or greater fault tolerance.

I don't personally see what the difference is between:

Filebeat -> LogZoom -> Redis -> Logstash -> (Backends)

and

Filebeat -> LogStash -> Redis -> Logstash -> (Backends)

or even better

Filebeat -> Redis -> Logstash -> (Backends)

You can read more about the filebeat Redis output here: https://www.elastic.co/guide/en/beats/filebeat/current/redis...

1 comments

> If you're around I'd love to hear specifically what you mean by this. Internally Logstash is very thread friendly, we only recommend multiple processes when you want either greater isolation or greater fault tolerance.

Right, we considered using multiple Logstash processes, but we really didn't want to run three instances of Logstash requiring three relatively heavyweight Java VMs. The total memory consumption of a single VM running Logstash is higher than running three different instances of LogZoom.

We looked at the Filebeat Redis output as well. First, it didn't seem to support encryption or client authentication out of the box. But what we really wanted was a way to make Logstash duplicate the data into two independent queues so that Elasticsearch and S3 outputs could work independently.

Thanks for the thoughtfully considered response :).

Regarding security with redis. Did you read the docs here? https://www.elastic.co/guide/en/logstash/current/plugins-out... Logstash does support Redis Password auth (as does Filebeat). Regarding the encryption with redis point, seeing as Redis doesn't support SSL itself, are you using spiped as the official Redis docs recommend?

Regarding the two queues, I would like to clarify that you can do this with the:

Filebeat -> Logstash -> Redis -> Logstash -> (outputs) technique.

If you declare two Logstash Redis outputs in the first 'shipper' Logstash you can write to two separate queues. And have the second 'indexer' read from both.

It is true that if one output is down we will pause processing, but you can use multiple processes for that. It is possible that in the near future we will support multiple pipelines in a single process (which we already do internally in our master branch for metrics, just not in a publicly exposed way yet).

Regarding JVM overhead. That's a fair point about memory. The JVM does have a cost. That said, memory / VMs are cheap these days, and that cost is fixed. One thing to be careful of is that we often times see people surprised to find that they get a stray 100MB event going through their pipeline due to an application bug. Having that extra memory is a good idea regardless. We have many users increasing their heap size far beyond what the JVM requires simply to handle weird bursts of jumbo logs.

Thanks for that information. There's no doubt Logstash can do a lot, and it sounds like with the multiple pipeline feature Logstash will make it easier to do what we wanted to do in a single process.

In the past, we've also been burned by many Big Data solutions running out of heap space that adding more processes that relied on tuning JVM parameters again did not appeal to us.