Hacker News new | ask | show | jobs
by yawniek 1610 days ago
I dont understand why almost nobody wants to understand the difference between message brokers and a distributed log and its implications.
3 comments

Because almost everyone refers to Kafka as a queue. And Kafka isn’t a queue.
Doesn't confluent offer a JMS client for kafka? So certainly you can use kafka as a queue.
It may be possible to talk to it via JMS. There are also bridges for AMQP. That still does not make Kafka a queue. You can use like a queue by sugarcoating over it but Kafka itself isn't a queue.
You need to pre-commit to the amount of parallelism you want, and work-stealing or any other out-of-order processing is nearly impossible. Any Kafka client bug tracker is full of people who are confused (or worse) about this.

We use Kafka as a queue because we understand it very well, but it has a lot of limitations compared to "purpose-built" queuing services.

You mean pre-commit to the maximum parallelism by setting the number of partitions or am I missing something?

In that case, sure, theoretically with JMS you can spawn as many parallel consumers as you want and with kafka you're limited to the number of partitions you've configured your topic with, as you can only have one partition per consumer.

But you get so much more out of kafka with consumer groups and partitions when you do complex message processing that would be a lot harder with traditional queues (if you partition based on the same key).

> You mean pre-commit to the maximum parallelism by setting the number of partitions?

Yes. Then depending on the data it can be difficult-to-impossible to scale past that. Scaling down, of course you can always start fewer consumers, but unless you have many more partitions than consumers or partitions as a multiple of consumers, the load will be unbalanced.

> But you get so much more out of kafka with consumer groups and partitions when you do complex message processing

Well, what's "complex message processing"? If you mean the stream topology is complex or the operations benefit from key sharding I agree. If you mean that some atomic operation is complex, no, it's irrelevant or even bad e.g. the complexity means you need a DLQ/OOO retries, or load per item is so unpredictable you want to work-steal.

How difficult it is to scale depends on your requirements, but if those aren't strict it's as easy as adjusting a parameter. In my experience it's not hard at all.

If you can live with not strictly ordered messages around scale time you just rescale. If you can live with some latency you stop producing, wait until lag is zero, scale, and then start producing again.

Plus if you pick a partition number with nice divisors like 6, 12, 20, 24, 40 or 60 you can have balanced consumption with different number of consumers.

Because with kafka you can do (almost? exactly once delivery? transactions?) everything you can do with RabbitMq and a lot more, and everybody wants a piece of the action, but not everybody is prepared to pay the costs.
Can you give a tldr/eli5?
Unlike a queue it is common for events to be: Kept indefinitely for future re-consumption. Partitioned with different consumers seeing different slices of the data.
Kept indefinitely for future re-consumption.

Are there any stories of this saving someone from a potential disaster? My experience has been that this only causes bugs, such as resending hundreds of thoudands of out-of-date emails.

It's not usually intended as a disaster recovery strategy; it's more often intended so that the event stream can be ingested in future by completely new consumers.

Retention policies and compaction exist where you don't actually want to keep the data, but the capability is one of the distinguishing features.

You can make an event log look like a database or a queue or a cache - but if you do that you should definitely consider whether you are using the right tool for the job.

Independently of application logic (which also sometimes uses it), it's the primary mechanism in Kafka for handling high-throughput consumers while still recovering from failure. Consumers grab a batch of e.g. 1000 events, checkpoint every X seconds while processing them; if they die the events are still there and they restart from the last checkpoint.

It also means every message in Kafka is "addressable" via topic/partition/offset which lets you refer to "foreign" messages etc.

Being able to replay the contents of a Kafka log is a very common recovery mechanism for many distributed systems and is used all the time.
A log keeps and protects everything it receives in a list, a broker just distributes things and do not keep a record of what it sends/receives