| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kasey_junk 3307 days ago

People tend to talk about Kafka in the same conversations as messaging systems because it can support messaging use cases, but that leads to a mental model of what Kafka is (and conversely what messaging systems tend to be) that is incorrect.

Its better to think of Kafka as a distributed log service than a messaging broker. When described this way, don't think log as in "the things humans look at to debug applications coming out of stdout" but "the things machines look at as a storage data structure". Think the write ahead log in a database and not printf statements.

What this means is that under the covers Kafka is a bunch of ordered files being written to by producers. Consumers can specify where in the log they want to start consuming from and then "tail" the log once they are caught up. The architecture is also such that consumers are very light weight (a tcp/ip connection and an offset in the log). This also makes trivial things like the "late joiner" problem in messaging systems and durability. Kafka then layers on high availability and consistency configurations that mean that you can be sure that your published log entries are 1) stored on multiple machines and 2) ordered the same for everyone. The combination of those 2 things is very powerful and makes reasoning about distributed systems application data much simpler. There are also certain classes of problems that need to be solved with those promises, namely things that are not idempotent.

NSQ is a much more traditional buffered messaging system. It has file durability but only as a) an optimization to prevent message loss once memory runs out and b) as a consumer archive. But a hard loss of a node means that those messages that have not been delivered can be lost as there is no promise they are published somewhere else. Further, there is no promise that the order of messages published to a topic and channel is the order of messages received by the consumer. Dealing with late joiners is an application level concern as is archive and replication.

That said, Kafka is complicated. I think its complicated because its solving a complicated problem not because its poorly factored (though I'd love it if they built the consensus stuff in directly and removed the zookeeper dependency).

NSQ isn't complicated and is easy to operate. If your problem set falls into a more traditional messaging domain that fits the NSQ model, you are almost certainly better off with it, but you can likely use Kafka also. If your problem set falls into the write ahead log model, you can't use NSQ (without massive application level logic) but you can use Kafka.

1 comments

StavrosK 3306 days ago

That's a great comparison, thank you. Basically, the "log structure vs in-memory queue" distinction at the start crystallizes the differences, thanks for the reply.

link