| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nodesocket 3705 days ago
	Can somebody provide a real-life use case for Kafka? I've seen comparisons between Redis, but what specifically does Kafka solve that Redis cannot?

3 comments

ChartsNGraffs 3705 days ago

I'd say it's biggest differentiator from a typical messaging system is the ability to rewind and reconsume messages. It's meant to offload a large volume of data quickly and then retain it for some time so that it can processed later on. Data is published to topics and it is entirely feasible to read from one (or more topics), process that data and then publish the results to a different topic. In comparison to Redis, I would say that while they overlap they're each better suited for different problems. Redis is blazing fast, but it's parallelism/replication story isn't as great as Kafka's. Redis is a lot easier to get running though.

yolesaber 3705 days ago

Let's say you have a CMS which pushes content to your site. You also want to make the whole site searchable, so you index your content into (e.g) Elasticsearch. Kafka is great for this because you can put the content onto Kafka's message queue and then have a service reading from it which then put's it into Elasticsearch. It scales well, too. So let's say your site takes off and you have hundreds of articles published a day (not to mention updates, deletions etc) - these events can all be sent to kafka and it will maintain the order as well as still be fast. You can also have many many services reading (consuming) from it simultaneously and it will handle it nicely.

Basically, if you want to get data from one place to another and care about order, Kafka is a good solution. It acts as a middleman between services.

balamaci 3705 days ago

Hm but why would you not send it directly to ElasticSearch?

sethammons 3704 days ago

Kafka shines when you have multiple services that have data to publish and multiple services that need to read that data stream. If you have three services and they write to ES, publish metrics to some other store, and log events to the db, you could instead write that all to Kafka, and individual consumers can use the data (for instance, to put into ES). On the origin-service side, it has one integration point; it does not need to know about ES. Now let's say that your users want a near real-time dashboard of their data changes on your multiple services. All you do is make a new consumer from Kafka. You don't add it to your three services. Kafka simplifies your service relation graph.

balamaci 3704 days ago

Well I definitely support the example of using Kafka for analytics with a streaming solution like Flink or Spark, etc. However I asked the "why not directly to ES" question because the example of using Kafka just as a layer in front of ES I felt it kinda painted Kafka layer as something "we could do because we can, not because we need to".

dkersten 3704 days ago

Because then you have the problem of dual-writes.

https://martin.kleppmann.com/2015/05/27/logs-for-data-infras...

evgenyp 3704 days ago

One key ability is to batch updates.

We just implemented Kinesis (AWS service similar to Kafka) to reduce load on our Elasticsearch database (~50GB) when running hundreds of individual jobs.

Individual tasks (implemented in Celery, actually running off Redis) push to a Kinesis stream which is then consumed in batches by a very simple processor.

saryant 3704 days ago

When you need to reindex, all the original writes are still on your Kafka topic and the consumer that fed ElasticSearch and replay from the start.

kubek2k 3705 days ago

because: * you can do ES indexing async * having articles index instantly is not critical (I guess)

balamaci 3704 days ago

Well ES is pretty fast by itself - lots of people use it to store log entries(ELK stack) and every log line triggers an indexing event in ES. Introducing Kafka into the mix just seems like an unnecessary complication.

yolesaber 3704 days ago

It's not unnecessary if ES is just one of the endpoints. Kafka shines because it can accommodate a ton of consumers - so you can write once to kafka and then use it populate ES, databases, whatever. Furthermore, what happens if you need to re-index (say, if you update a mapping to an existing object)? It becomes trivial to reindex by replaying all the data from Kafka into ES thus saving you a lot of time.

If you are just dumping into ES, then yes, probably not the best tool (though it wouldn't necessarily hurt) - just use the HTTP API for that. However if you want to build a robust pipeline for multiple services or think you'll be needing to scale the feed into ES, Kafka is useful.

manigandham 3705 days ago

Kafka and Redis are very different things - see this: https://news.ycombinator.com/item?id=11577312

Redis is a database, Kakfa is a data logging system built for scale and throughput. Event processing (of any kind like stocks, ad impressions, ecommerce purchases) are a great fit. Also good as a message queue unless you need ultra low-latency RPC.

nodesocket 3705 days ago

Gotcha, so then advantage of Kafka over Logstash + ElasticSearch?

saryant 3705 days ago

Kafka, Redis and Logstash+ElasticSearch have really nothing to do with each other.

Kafka is a distributed, fault-tolerant and highly scalable message broker.

Redis is a very fast key/value (another other data types) store.

I suppose that at a high level Logstash can be compared to Kafka but IME Logstash can't handle scale. It's trivially easy to bring Logstash to its knees.

Elasticsearch is, well, a search engine.

010a 3705 days ago

Redis does a lot more than just store keys and values. Functionally speaking Redis Pub/Sub and Kafka are interchangable up to a certain level of throughput.

saryant 3705 days ago

They really serve very different use cases. Redis pub/sub consumers only receive messages while connected, whereas Kafka consumers can pick up where they left off. Event ordering, back pressure, etc.

There are many use cases Redis pub/sub can't serve beyond just scalability.

balamaci 3705 days ago

Well Logstash can output data into Kafka or ElasticSearch. So you could for example transform logs to json or do simple text processing in Logstash and put it to ElasticSearch for logsearching but you can also put it in Kafka and then have a stream parsing with all sorts of tools like Flink, Spark, etc. You could then have the possibility do so some realtime analysis on what the user do all over your stack. Too many "Login Failed" events and maybe you have an attacker trying to bruteforce a passsword and maybe you need to present him with a captcha screen.

manigandham 3705 days ago

ElasticSearch is a database optimized for searching, not related at all.

You can somewhat compare Kafka to Logstash but Kafka has no processing, it's purely a distributed log writing/reading/storage system that also scales far more than logstash can. You write data to it and then read from it with a basic messaging abstraction of topics and partitions.

ec109685 3705 days ago

ElasticSearch can store sequenced number data, which is really all that Kafka is doing, so I don't think it is fair to say it isn't related at all.

allengeorge 3704 days ago

So can a RDBMS... But that doesn't mean that Kafka and databases are related.

As multiple comments have stated above, Kafka is really a distributed message subsystem. Its core interface is a set of topics that one can publish to, and that consumers can read from (in other words, a pub-sub system). Kafka doesn't inspect the message payload at all.

Elasticsearch is a unstructured (to some extent) document store that's optimized around document search. So at the very least, the payload is important when using Elasticsearch.

manigandham 3704 days ago

That's like saying they all write data to disk so they're related.

Elasticsearch is all about saving, inspecting, indexing and retrieving your data through a rich document-based model and search-optimized methods.

ES might be able to do the same thing functionally because it operates at a higher level but ultimately will never scale or be as simple in access as Kafka.