Hacker News new | ask | show | jobs
by Serow225 1864 days ago
I feel dumb saying it, but as someone whose had a lot of experience with messages buses (Rabbit and AMQP1.0), I've always struggled to understand what domains/situations Kafka is actually the best fit for. It's probably because of the areas that I work in which doesn't make it obvious, but I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

RabbitMQ is one of those things that I've always found better to let the experts run (managed SaaS), unless your team is really wanting to take on the burden of becoming an Erlang distributed system debugger :)

Pulsar seems really interesting... There are now more managed Pulsar offerings coming online (StreamNative, DataStax who bought Kesque, Pandio, etc)

4 comments

Well, they’re two different things. AMQP 1.0 is everyone to everyone messaging where every party can be a consumer and a server. RabbitMQ, traditionally, is a queue. You add a message, you take it and lock it, if no processing confirmation, it is released back for someone else to process. Kafka is an append only log. You put a message in and consumers just roll over them. Rabbit/amqp is random access, Kafka sucks at random access. With amqp, you’ll have hundreds/thousands of queues, this may be difficult with Kafka.

You’d use Kafka more as an unbounded buffer and build different paradigms on top of it. It not unusual to ingest 100s if mbits of data into kafka, potentially saturating the network while also reading that much out. Amqp is better for large number of queues where each queue has less messages in. Think mqtt, websockets - many, many consumers.

It would be reasonable to use both next to each other.

But I’d never go for rabbitmq. I’d go for azure servicebus or artemis with qpid.

Thank you, that is quite helpful :)
Kafka is distinctly different from enterprise messaging systems like AMQP.

I generally think of messaging systems falling into 4 distinct categores: PubSub, Streaming, Queues and Enterprise Messaging Systems.

PubSub sytems are focused on non-durable (usually), low latency messaging generally without acknowledgements and generally at-most-once. i.e things like Redis PUBSUB, NATS, etc

Queues are generally focused on fanout to multiple consumers with at-least-once processing of durable messages with acknowledgements. i.e Celery/Sidekiq, Que, AWS SQS.

Streaming systems are designed for throughput and usually are based on some form of a distributed log concept. Generally offload offset management to consumers. i.e Kafka, Kinesis

Enterpise Messaging Systems favor flexibility above all else and usually have some mechanism of encoding the flow of data separately from the applications themselves. i.e exchange routing topologies in AMQP as an example. They can generally implement pubsub, queues and direct messaging paradigms. Tradeoffs being poorer availability, complexity and poor performance vs specialised systems. i.e RabbitMQ, HornetMQ, etc.

So you end up using Kafka when it's limitations aren't a problem and you need the throughput. It works best when each and every message in a stream is homogenous as such failure to process a message is unlikely to be independent of failure to process a following message. This alleviates the main drawback of streaming systems which is head of line blocking.

Some cases where it works very well is event streams, data replication/CDC, etc.

Thank you, that is very helpful :) > "So you end up using Kafka when it's limitations aren't a problem and you need the throughput." I think that's what I've run into so far, is that my usecases haven't needed the throughput of Kafka and so all I'm left with is the feature gaps that I'd miss from something like AMQP1.0

Apache Pulsar looks pretty interesting :)

> I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

If order of message processing matters, then Kafka is better suited then AMQP. For example, In a distributed application for money transfers, if AMQP used, message order will be lost and some problems will occur in the following scenario:

User A with an accound of $1000 makes order for two transfers T1 ($600) and T2 ($500)

  - Rabbit delivers T1 to server1, before processing message, server1 enters a full GC.
  - Rabbit delivers T2 to server2 and server2 processes message immediately, now User A's account have $500  
  - Server1 resumes its life after the end of GC, but fails to process T1 since account's balance is less than required amount.
However, it is T2 that should have failed because User A ordered T1 first and T2 after.

In Kafka, when user account identifier is used for partitioning key, all User A's messages will be processed by same consumer (i.e server1), so even if server1 enters a full GC, that is OK, since T2 will be processed after T1.

FWIW, AMQP1.0 does support Sessions which can be used to address this scenario :)
Kafka was designed for scale and developed at LinkedIn to meet their extremely high throughput. It's a distributed log, basically writing to an append-only file that's partitioned by a hash of a key (that can be set on every message).

It makes the brokers as dumb as possible to optimize for performance and the logic sits in the client. You can ask to read back from the log at any point or at the current tail. Acknowledging messages is just writing a bookmark to another topic saying where you last read up to, or you can keep track of it yourself somewhere else.

You can always build more complex logic on top which Confluent has done with things like ksqldb.