Hacker News new | ask | show | jobs
by jpgvm 1862 days ago
RabbitMQ went on my "never again" list of things after dealing with it for years as part of OpenStack and projects before that. At the time there wasn't many viable alternatives to AMQP that were OSS and reasonable.

However many split brains and grey hairs later I decided RabbitMQ was almost never worth it regardless of how many of AMQPs advanced features you could make use of.

For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

Recently I have started using Pulsar which provides selective ack and all the best parts of AMQP without the complexity and unneeded parts. i.e it has things like scheduled delivery and TTLs in addition to the all important shared subscription which makes queues "just work" on top of streams.

If you want something like RabbitMQ but with a simpler API and are comfortable with JVM services give Pulsar a go. It's not for everyone but if you are already using a lot of the big data stack it's probably a good fit.

9 comments

Can only echo parent. 3 places of work in varying sizes, 4 projects in varying maturity, not a single RMQ administration staff that was competent enough to reliably run the cluster.

Which of course leads me to believe the problem isn't with the people but with the ridiculously high threshold of knowledge, experience and app developer self-control needed to run RMQ successfully.

As parent said, many meltdowns later, I'm now firmly in the "No Rabbit!" camp. Redis pubsub/queues for immediate lossy delivery, kafka / gcp pubsub / aws sqs for less latency sensitive flows that require more consistency guarantees.

I concure. I like RMQ because I know how to configure and administer it but I would never trust anyone with it. My first exposure to it was on a project where the lead architect failed to read the documentation let alone understand any of it. Several years later I was able to fix it and then left it in the hands of some other incompetents.
I feel dumb saying it, but as someone whose had a lot of experience with messages buses (Rabbit and AMQP1.0), I've always struggled to understand what domains/situations Kafka is actually the best fit for. It's probably because of the areas that I work in which doesn't make it obvious, but I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

RabbitMQ is one of those things that I've always found better to let the experts run (managed SaaS), unless your team is really wanting to take on the burden of becoming an Erlang distributed system debugger :)

Pulsar seems really interesting... There are now more managed Pulsar offerings coming online (StreamNative, DataStax who bought Kesque, Pandio, etc)

Well, they’re two different things. AMQP 1.0 is everyone to everyone messaging where every party can be a consumer and a server. RabbitMQ, traditionally, is a queue. You add a message, you take it and lock it, if no processing confirmation, it is released back for someone else to process. Kafka is an append only log. You put a message in and consumers just roll over them. Rabbit/amqp is random access, Kafka sucks at random access. With amqp, you’ll have hundreds/thousands of queues, this may be difficult with Kafka.

You’d use Kafka more as an unbounded buffer and build different paradigms on top of it. It not unusual to ingest 100s if mbits of data into kafka, potentially saturating the network while also reading that much out. Amqp is better for large number of queues where each queue has less messages in. Think mqtt, websockets - many, many consumers.

It would be reasonable to use both next to each other.

But I’d never go for rabbitmq. I’d go for azure servicebus or artemis with qpid.

Thank you, that is quite helpful :)
Kafka is distinctly different from enterprise messaging systems like AMQP.

I generally think of messaging systems falling into 4 distinct categores: PubSub, Streaming, Queues and Enterprise Messaging Systems.

PubSub sytems are focused on non-durable (usually), low latency messaging generally without acknowledgements and generally at-most-once. i.e things like Redis PUBSUB, NATS, etc

Queues are generally focused on fanout to multiple consumers with at-least-once processing of durable messages with acknowledgements. i.e Celery/Sidekiq, Que, AWS SQS.

Streaming systems are designed for throughput and usually are based on some form of a distributed log concept. Generally offload offset management to consumers. i.e Kafka, Kinesis

Enterpise Messaging Systems favor flexibility above all else and usually have some mechanism of encoding the flow of data separately from the applications themselves. i.e exchange routing topologies in AMQP as an example. They can generally implement pubsub, queues and direct messaging paradigms. Tradeoffs being poorer availability, complexity and poor performance vs specialised systems. i.e RabbitMQ, HornetMQ, etc.

So you end up using Kafka when it's limitations aren't a problem and you need the throughput. It works best when each and every message in a stream is homogenous as such failure to process a message is unlikely to be independent of failure to process a following message. This alleviates the main drawback of streaming systems which is head of line blocking.

Some cases where it works very well is event streams, data replication/CDC, etc.

Thank you, that is very helpful :) > "So you end up using Kafka when it's limitations aren't a problem and you need the throughput." I think that's what I've run into so far, is that my usecases haven't needed the throughput of Kafka and so all I'm left with is the feature gaps that I'd miss from something like AMQP1.0

Apache Pulsar looks pretty interesting :)

> I'd love to hear what exact scenarios it does actually make sense to use Kafka for instead of Rabbit or AMQP1.0 :)

If order of message processing matters, then Kafka is better suited then AMQP. For example, In a distributed application for money transfers, if AMQP used, message order will be lost and some problems will occur in the following scenario:

User A with an accound of $1000 makes order for two transfers T1 ($600) and T2 ($500)

  - Rabbit delivers T1 to server1, before processing message, server1 enters a full GC.
  - Rabbit delivers T2 to server2 and server2 processes message immediately, now User A's account have $500  
  - Server1 resumes its life after the end of GC, but fails to process T1 since account's balance is less than required amount.
However, it is T2 that should have failed because User A ordered T1 first and T2 after.

In Kafka, when user account identifier is used for partitioning key, all User A's messages will be processed by same consumer (i.e server1), so even if server1 enters a full GC, that is OK, since T2 will be processed after T1.

FWIW, AMQP1.0 does support Sessions which can be used to address this scenario :)
Kafka was designed for scale and developed at LinkedIn to meet their extremely high throughput. It's a distributed log, basically writing to an append-only file that's partitioned by a hash of a key (that can be set on every message).

It makes the brokers as dumb as possible to optimize for performance and the logic sits in the client. You can ask to read back from the log at any point or at the current tail. Acknowledging messages is just writing a bookmark to another topic saying where you last read up to, or you can keep track of it yourself somewhere else.

You can always build more complex logic on top which Confluent has done with things like ksqldb.

Pulsar definitely looks like a great combo of capabilities from Kafka + AMQP. I have been wanting to try it out with some of our stuff, but inertia/time constraints have made it hard to consider moving away from Rabbit.

I'm curious what makes it go on your "never again" list? We've definitely had our fair share of issues with it, namely -

- Really easy to misconfigure queues/exchanges, especially trying to do something like have a retry + DLQ setup.

- If you have a queue build up to a large number of messages (100 million+) for whatever reason, purging it will probably bring down the cluster.

Overall, our experience has been mostly positive. It isn't on my "never again" list, but I'm definitely wary of some parts of it and it is on my list of one of the more difficult pieces of our infrastructure to scale.

Pulsar is great but also a heavy install. There are lots of messaging products now and I would recommend NATS as the best 1:1 replacement of RabbitMQ if you need a message broker with advanced routing: https://nats.io/
>For the longest time I just made do with Kafka but this had serious deficiencies when implementing queues because of the cumulative ack only nature of Kafka.

We built a Kafka consumer that's effectively capable of selective acks by producing bad messages to separate topics. It's a little silly but it works.

DLQ is the way to do this. Nothing silly about it.
It's a heavyweight custom client to hack Kafka into being something it isn't, that other message brokers are. That's what's silly.
Wow, and here I was feeling like I was the only one who thought this way! Such a similar experience to yours.
While RabbitMQ is not a “never again” for me, I agree that making it work reliably does involve a few knobs and architectural tricks. I have been using AMQP/Rabbit/Kafka previously (in that order), and switched to Pulsar where I can since 2017 ( https://stackoverflow.com/a/47477765 ).

It has been great overall.

So good I have recently decided to slow down client work and build a managed SaaS offering for Pulsar: https://turtlequeue.com It is a work in progress, however it is a bit different from the nascent Pulsar offerings out there.

The main goal are ease of use and being cheap. How do I go about it?

1. Behind the scenes there is only one pulsar cluster. This lowers the costs of hosting dramatically. Even the smallest production pulsar cluster requires:

  - ZooKeeper node(s)
  - Bookies nodes
  - Brokers node(s)
  - (optional) Function workers node(s)
  - (optional) Proxies node(s)
  - Pulsar Manager
  - Prometheus
  - Grafana
.. typically this runs on top of kubernetes these days, so throw in volume storage and a LoadBalancer. Hosting small setups is costly. By having a shared cluster I can lower the costs enough to provide a free “try me” service at little to no cost to me. And nobody will suffer from the “noisy neighbour” as Pulsar is designed to be multi-tenant and can enforce limits per tenant.

2. Tq (turtlequeue) users do not have to care about how the cluster operates (typical SaaS). It is also dramatically easier for me to monitor and operate only one cluster.

3. How do I expose this safely and make it easy for users to use Pulsar then? Experienced Pulsar users will notice that this is not easy to do at the moment with pulsar. I am developing a custom proxy! This in turns allows me to collect metrics/enforce finer permissions, present a nicer dashboard.

Where am I now? The custom proxy works, the website/docs/login/dashboard/metrics/pricing need a lot of TLC. So “soon”. I will be looking for beta testers, if you are interested please email turtle@turtlequeue.com Feel free to email me too if you just want to be kept in the loop :)

Hmm, what issues did you run into? I've used it on a few projects in a mirrored way and it was always fine. Is the clustering the issue?
Not OP, but that is my experience. It worked like a rock on a single server. Clustering brought us issues rooting from its complexity. Split brain scenario, corruption of the Mnesia database and such. We went back to single server mode.
>Mnesia

Is Mnesia just terrible or is there some trick? I did run into issues you're talking about with ejabberd clusters.

Same! Never bothered with the so-called HA setup after running a cluster for few months. Making all messages durable + backups of underlying storage are sufficient, while the do not prevent an outage, at least bringing the system back to an operational state is fairly straightforward
I know workplaces where you can get sacked just for mentioning RabbitMQ...