Redis vs. Kafka vs. RabbitMQ

Y	Hacker News new \| ask \| show \| jobs

	Redis vs. Kafka vs. RabbitMQ (blog.devgenius.io)
	39 points by FragrantRiver 1610 days ago

11 comments

detaro 1610 days ago

Seems to be copied from https://otonomo.io/redis-kafka-or-rabbitmq-which-microservic... (where it has a date from 2019)

This same blog already had a copied submission yesterday, so it seems it regularly does this: https://news.ycombinator.com/item?id=30061113

link

yawniek 1610 days ago

I dont understand why almost nobody wants to understand the difference between message brokers and a distributed log and its implications.

link

rad_gruchalski 1610 days ago

Because almost everyone refers to Kafka as a queue. And Kafka isn’t a queue.

link

raducu 1610 days ago

Doesn't confluent offer a JMS client for kafka? So certainly you can use kafka as a queue.

link

rad_gruchalski 1610 days ago

It may be possible to talk to it via JMS. There are also bridges for AMQP. That still does not make Kafka a queue. You can use like a queue by sugarcoating over it but Kafka itself isn't a queue.

link

morelisp 1610 days ago

You need to pre-commit to the amount of parallelism you want, and work-stealing or any other out-of-order processing is nearly impossible. Any Kafka client bug tracker is full of people who are confused (or worse) about this.

We use Kafka as a queue because we understand it very well, but it has a lot of limitations compared to "purpose-built" queuing services.

link

raducu 1610 days ago

You mean pre-commit to the maximum parallelism by setting the number of partitions or am I missing something?

In that case, sure, theoretically with JMS you can spawn as many parallel consumers as you want and with kafka you're limited to the number of partitions you've configured your topic with, as you can only have one partition per consumer.

But you get so much more out of kafka with consumer groups and partitions when you do complex message processing that would be a lot harder with traditional queues (if you partition based on the same key).

link

morelisp 1610 days ago

> You mean pre-commit to the maximum parallelism by setting the number of partitions?

Yes. Then depending on the data it can be difficult-to-impossible to scale past that. Scaling down, of course you can always start fewer consumers, but unless you have many more partitions than consumers or partitions as a multiple of consumers, the load will be unbalanced.

> But you get so much more out of kafka with consumer groups and partitions when you do complex message processing

Well, what's "complex message processing"? If you mean the stream topology is complex or the operations benefit from key sharding I agree. If you mean that some atomic operation is complex, no, it's irrelevant or even bad e.g. the complexity means you need a DLQ/OOO retries, or load per item is so unpredictable you want to work-steal.

link

raducu 1610 days ago

Because with kafka you can do (almost? exactly once delivery? transactions?) everything you can do with RabbitMq and a lot more, and everybody wants a piece of the action, but not everybody is prepared to pay the costs.

link

jstx1 1610 days ago

Can you give a tldr/eli5?

link

dcminter 1610 days ago

Unlike a queue it is common for events to be: Kept indefinitely for future re-consumption. Partitioned with different consumers seeing different slices of the data.

link

nitrogen 1610 days ago

Kept indefinitely for future re-consumption.

Are there any stories of this saving someone from a potential disaster? My experience has been that this only causes bugs, such as resending hundreds of thoudands of out-of-date emails.

link

dcminter 1610 days ago

It's not usually intended as a disaster recovery strategy; it's more often intended so that the event stream can be ingested in future by completely new consumers.

Retention policies and compaction exist where you don't actually want to keep the data, but the capability is one of the distinguishing features.

You can make an event log look like a database or a queue or a cache - but if you do that you should definitely consider whether you are using the right tool for the job.

link

morelisp 1610 days ago

Independently of application logic (which also sometimes uses it), it's the primary mechanism in Kafka for handling high-throughput consumers while still recovering from failure. Consumers grab a batch of e.g. 1000 events, checkpoint every X seconds while processing them; if they die the events are still there and they restart from the last checkpoint.

It also means every message in Kafka is "addressable" via topic/partition/offset which lets you refer to "foreign" messages etc.

link

jpgvm 1610 days ago

Being able to replay the contents of a Kafka log is a very common recovery mechanism for many distributed systems and is used all the time.

link

Fiahil 1610 days ago

A log keeps and protects everything it receives in a list, a broker just distributes things and do not keep a record of what it sends/receives

link

mindwok 1610 days ago

Expect performance issues when running RabbitMQ with persistence? What kind of issues?

Also the comment about Redis not being persistent, Redis has lots of persistence options that allow you to choose trade offs between durability and performance, with the most strict setting using an append only log of every query.

Decent 1000 mile view of these solutions, but some more depth would have been nice.

link

fnord123 1610 days ago

There issues if you run with mirroring. It's advised that you use quorum queues here:

https://www.cloudamqp.com/blog/reasons-you-should-switch-to-...

However many people using Rabbit are using it with Celery which doesn't work with quorum queues - so people learn to expect performance issues.

link

cebert 1610 days ago

Is anyone else considering using DyanmoDB for event sourcing instead of Kafka? We have a project at work where we store events into an event table. We then had a stream that invokes a distributor lambda. The distributor lambda lookups subscribers to a given event type. For each subscriber for a given event, we place a copy of the event to a SQS queue for the event subscriber. Each subscriber lambda can process events from its own queue. If processing fails, each subscriber will retry processing a configured number of times before moving the failed event to a dead letter queue.

This approach has been working well for us without the need of Kafka with our serverless apps. I was curious if anyone is doing something similar.

link

mtberatwork 1610 days ago

Maybe I'm missing something but this setup sounds like a re-creation of Beanstalkd. https://beanstalkd.github.io/

link

cebert 1610 days ago

I haven’t heard of Beanstalkd before. I am just starting to read there site, but this sounds like a distributed job processing system? There’s some potentially some overlap. Our goal was to have a serverless form of event sourceing. We don’t have 1000s of subscribers to events (we aren’t FB), so our approach is working ok for us. Thanks for informing me about Beanstalkd, I will read up on it more.

link

mtberatwork 1610 days ago

If you look at the concept of "tubes" in beanstalkd, it seems to be quite similar to what you are doing in AWS. Each of your independent queues is just a "tube" in beanstalkd. Quite honestly though, you can do this same functionality with just about any message broker. No need to run all these separate services.

link

coffeeri 1610 days ago

Especially if you want one-to-many connections, exactly once delivery and push/ pull consumers, it is worth checking out nats [0, 1]. It is very performant. There are clients for Javascript, Python, Rust and Go [2]. (I am not affiliated with them)

[0] https://nats.io/ [1] https://github.com/nats-io/nats-server [2] https://nats.io/download/#nats-clients

link

birdayz 1610 days ago

that's just not true. NATS does have neither at-least-once nor exactly-once. NATS Jetstream has at-least-once, but it does not have flexible topics anymore like NATS. It's two completely different things.

link

coffeeri 1610 days ago

I was including NATS Jetstream, which has exactly-once message delivery [0].

[0] https://docs.nats.io/nats-concepts/jetstream#exactly-once-me...

link

bruth 1610 days ago

You may be thinking of NATS Streaming which was a predecessor to JetStream. It was indeed a standalone thing that used NATS under the hood. JetStream is now baked into the NATS server.

link

Lio 1610 days ago

I treat Redis a bit like I treat PostgreSQL.

It's what I use when I don't know what I should use.

(In fact these days I might even be tempted to start with a simple PostgreSQL based queue and only swap to Redis later if it becomes clear that's what's needed).

I guess a better approach might be to carefully analyse requirements up front but if those requirements aren't known at the time you start the project it's useful just to get you going.

link

altdataseller 1610 days ago

Same. Kafka might be better for larger datasets/throughput but a much harder to maintain and manage. Redis is much simpler if you know you aren’t going to scale that much

link

morelisp 1610 days ago

It's less about scale and more about availability - by the time I've set up a 3 Redis + 3 sentinel cluster it doesn't look operationally too different in complexity than a small Kafka cluster (and more complex once KRaft is ready). But a single non-HA Redis is way easier. (But if I don't need HA - a single memcache is easier yet...)

link

jpgvm 1610 days ago

This comparison is massively oversimplified and misleading.

The most important features are normally not message liveliness or throughput (complex routing can definitely be a defining feature however). This is because both are fairly easy to solve for - especially in absence of other constraints.

Much more important are durability, ordering, partitioning, acknowledgement models, fencing/isolation/failure of both brokers and consumers, etc.

These are all very nuanced things but ultimately determine which systems can be used for which applications.

A lot of people with rush to recommend Kafka but it's actually a rather narrow solution, it's distributed log model is definitely the right way to persist and replicate messages but it's fetch and consumer group APIs are essentially hot garbage for anything except strict streaming or other ordered processing cases.

This would be the major sharp edge of Kafka that people don't understand and end up pidgeon-holed into patching themselves - strict cumulative acknowledgement. This leads to head of line blocking and the only solutions involve tracking acknowledgements yourself either not using consumer groups at all or layering some inefficient solution ontop of it that only updates the offset appropriately and properly skips processed messages when recovering/rebalancing.

An alternative this article misses is Apache Pulsar which is much better suited for the role of "general purpose messaging system" that can just as easily function as a worker queue where ordering isn't important and supports various models of ordered consumption depending on your requirements.

I was also going to suggest LogDevice but it appears it's been abandoned/archived sadly.

Regardless ignore fluff articles like this. Understand the caveats of the Kafka API before going all-in, if your problem fits it's very simple/cost effective solution so it's worth it if the constraints don't bother you and you aren't annoyed by Confluent's stewardship.

Otherwise I would preference Pulsar, it's the more flexible option that you are unlikely to grow out of. Even as you get big it's natively multi-tenant and geo-replicated etc.

link

phoe-krk 1610 days ago

Just a fun question: is there a comparison that compares how well a simple PostgreSQL-based queue[0] fares against these specialized solutions?

[0] https://blog.crunchydata.com/blog/message-queuing-using-nati...

link

onphonenow 1610 days ago

You can actually get pretty fancy and lightweight with listen / notify as well if desired (use select for a notify instead of endless queries)

https://www.psycopg.org/docs/advanced.html#asynchronous-noti...

link

phoe-krk 1610 days ago

Yes, that too! I'd like to see how different Postgres techniques compare in general.

link

jpgvm 1610 days ago

I'm not sure of anyone covering all the approaches in depth but by far the best solution is to use advisory locks. Take a look at Que, there are implementations in Ruby/Go that I'm aware of but you could easily port it to anything else as it's just a bunch of pretty simple PostgreSQL + some session features.

link

NiekvdMaas 1610 days ago

It's worth noting that there are several projects built on top of Redis pub-sub that add more functionality, such as BullMQ: https://github.com/taskforcesh/bullmq

link

speedgoose 1610 days ago

Bull is great ! It's only NodeJS but I used it at work and it has been super easy to setup, use, and maintain.

link

beardedetim 1610 days ago

We use Bull at $CURRJOB and it works great until there's an issue and for _some reason_ jobs stall. Getting into the nitty gritty to debug the Lua code was not a fun week. We moved onto RMQ for our needs on the next project and are looking at ways the replace Bull/Redis in the old project.

link

emilsedgh 1610 days ago

Exactly our experience. We had stalled jobs and lost jobs on both Bull and Kue.

We wrote a celery-like library for Node[0] and RMQ has been rock solid.

https://www.npmjs.com/package/peanar

link

pards 1610 days ago

Another key feature missing from the list is:

4. Redelivery

Are messages redelivered if there's a failure in the consumer? This is critical to many systems using older messaging platforms like MQ.

link

humpydumpy 1610 days ago

why would you submit a link with tracking parameter?

link