I'd say it's biggest differentiator from a typical messaging system is the ability to rewind and reconsume messages. It's meant to offload a large volume of data quickly and then retain it for some time so that it can processed later on. Data is published to topics and it is entirely feasible to read from one (or more topics), process that data and then publish the results to a different topic. In comparison to Redis, I would say that while they overlap they're each better suited for different problems. Redis is blazing fast, but it's parallelism/replication story isn't as great as Kafka's. Redis is a lot easier to get running though.
Let's say you have a CMS which pushes content to your site. You also want to make the whole site searchable, so you index your content into (e.g) Elasticsearch. Kafka is great for this because you can put the content onto Kafka's message queue and then have a service reading from it which then put's it into Elasticsearch. It scales well, too. So let's say your site takes off and you have hundreds of articles published a day (not to mention updates, deletions etc) - these events can all be sent to kafka and it will maintain the order as well as still be fast. You can also have many many services reading (consuming) from it simultaneously and it will handle it nicely.
Basically, if you want to get data from one place to another and care about order, Kafka is a good solution. It acts as a middleman between services.
Kafka shines when you have multiple services that have data to publish and multiple services that need to read that data stream. If you have three services and they write to ES, publish metrics to some other store, and log events to the db, you could instead write that all to Kafka, and individual consumers can use the data (for instance, to put into ES). On the origin-service side, it has one integration point; it does not need to know about ES. Now let's say that your users want a near real-time dashboard of their data changes on your multiple services. All you do is make a new consumer from Kafka. You don't add it to your three services. Kafka simplifies your service relation graph.
Well I definitely support the example of using Kafka for analytics with a streaming solution like Flink or Spark, etc. However I asked the "why not directly to ES" question because the example of using Kafka just as a layer in front of ES I felt it kinda painted Kafka layer as something "we could do because we can, not because we need to".
We just implemented Kinesis (AWS service similar to Kafka) to reduce load on our Elasticsearch database (~50GB) when running hundreds of individual jobs.
Individual tasks (implemented in Celery, actually running off Redis) push to a Kinesis stream which is then consumed in batches by a very simple processor.
Well ES is pretty fast by itself - lots of people use it to store log entries(ELK stack) and every log line triggers an indexing event in ES.
Introducing Kafka into the mix just seems like an unnecessary complication.
It's not unnecessary if ES is just one of the endpoints. Kafka shines because it can accommodate a ton of consumers - so you can write once to kafka and then use it populate ES, databases, whatever. Furthermore, what happens if you need to re-index (say, if you update a mapping to an existing object)? It becomes trivial to reindex by replaying all the data from Kafka into ES thus saving you a lot of time.
If you are just dumping into ES, then yes, probably not the best tool (though it wouldn't necessarily hurt) - just use the HTTP API for that. However if you want to build a robust pipeline for multiple services or think you'll be needing to scale the feed into ES, Kafka is useful.
Redis is a database, Kakfa is a data logging system built for scale and throughput. Event processing (of any kind like stocks, ad impressions, ecommerce purchases) are a great fit. Also good as a message queue unless you need ultra low-latency RPC.
Kafka, Redis and Logstash+ElasticSearch have really nothing to do with each other.
Kafka is a distributed, fault-tolerant and highly scalable message broker.
Redis is a very fast key/value (another other data types) store.
I suppose that at a high level Logstash can be compared to Kafka but IME Logstash can't handle scale. It's trivially easy to bring Logstash to its knees.
Redis does a lot more than just store keys and values. Functionally speaking Redis Pub/Sub and Kafka are interchangable up to a certain level of throughput.
They really serve very different use cases. Redis pub/sub consumers only receive messages while connected, whereas Kafka consumers can pick up where they left off. Event ordering, back pressure, etc.
There are many use cases Redis pub/sub can't serve beyond just scalability.
Well Logstash can output data into Kafka or ElasticSearch. So you could for example transform logs to json or do simple text processing in Logstash and put it to ElasticSearch for logsearching but you can also put it in Kafka and then have a stream parsing with all sorts of tools like Flink, Spark, etc.
You could then have the possibility do so some realtime analysis on what the user do all over your stack. Too many "Login Failed" events and maybe you have an attacker trying to bruteforce a passsword and maybe you need to present him with a captcha screen.
ElasticSearch is a database optimized for searching, not related at all.
You can somewhat compare Kafka to Logstash but Kafka has no processing, it's purely a distributed log writing/reading/storage system that also scales far more than logstash can. You write data to it and then read from it with a basic messaging abstraction of topics and partitions.
So can a RDBMS... But that doesn't mean that Kafka and databases are related.
As multiple comments have stated above, Kafka is really a distributed message subsystem. Its core interface is a set of topics that one can publish to, and that consumers can read from (in other words, a pub-sub system). Kafka doesn't inspect the message payload at all.
Elasticsearch is a unstructured (to some extent) document store that's optimized around document search. So at the very least, the payload is important when using Elasticsearch.
That's like saying they all write data to disk so they're related.
Elasticsearch is all about saving, inspecting, indexing and retrieving your data through a rich document-based model and search-optimized methods.
ES might be able to do the same thing functionally because it operates at a higher level but ultimately will never scale or be as simple in access as Kafka.