| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kungfufrog 1436 days ago

If I had to nominate a piece of software, as an SRE, that is as close to "set and forget" as possible, it'd absolutely be RabbitMQ standalone and a close runner up would be its clustered form.

I've worked in 3 places where RabbitMQ has been a fundamental cornerstone of the architecture, and while it does require a little tuning around performance occasionally (generally because it's being used inappropriately or without full consideration of its limitations/best practices), it's rock solid, easy to debug/inspect, has an active and supportive community, and is generally just all around pleasant to work with and maintain.

Kudos to RabbitMQ and its developers!

As an aside, the addition of recent features such as super streams, streams and quorum queues make it a compelling all-in-one tool for solving a bunch of architectural based concerns/requirements in application and infrastructure development. I've often thought about why it's not more utilised in the ops side of the world for metrics gathering and other usecases. I have also wondered how useful it'd be for log ingestion with lazy queues etc.

Does anyone out there have examples of unusual use cases for RabbitMQ where it's outshone some alternative product? Would love to hear about them!

6 comments

jsmeaton 1436 days ago

I feel like "and while it does require a little tuning around performance occasionally" is doing a lot of heavy lifting there :)

Honestly though my only experience with RabbitMQ has been as a backend for Celery (background task processor for Python) and I think my real issues are mostly to do with how Celery uses RabbitMQ in a very poor default setup.

Message confirmations are off by default and turning them on caused our queues to grind to a halt. Queues being single threaded and clusters don't much help with that without using some kind of sharding plugin. It seemed like getting to a good spot required a lot of arcane knowledge that wasn't so easy to find.

"Configuring RabbitMQ to be a performant message queue for background task systems" would be an excellent blog post I would share widely!

aeyes 1436 days ago

RabbitMQ cluster mode for classic queues is not a true HA solution. If a node gets replaced the content of your queue has to be synchronized to the new node and while this is running you can't produce to the queue which is an outage. Unfortunately this synchronization method is also unreliable if you have a significant (a few GB) amount of messages in the queue, it often crashes nodes with no way to recover the internal database. And even if everything works you still have a chance of lost messages or lost acks (to be fair, this is documented). It is so bad that it is now officially deprecated.

This also makes online upgrades extremely hard, the only acceptable way is to stand up a new cluster, switch producers and consumers and then shovel the data from the old cluster (which is also not too reliable).

They came up with quorum queues which have less features and require to keep all messages in memory. I don't like having servers with 90% unused memory for that one event where I actually need to queue a lot of messages because a consumer is broken.

I would never pump logs through RabbitMQ, if you get into a situation where you accumulate a large amount of data in a queue you will face trouble sooner or later. Most likely RabbitMQ will run out of memory, will "flow-control" producers and you'll have an outage you can't recover from.

captenjoyable 1436 days ago

> They came up with quorum queues which have less features and require to keep all messages in memory.

I don't think this is true

> Quorum queues store their message content on disk (per Raft requirements) and only keep a small metadata record of each message in memory. This is a change from prior versions of quorum queues where there was an option to keep the message bodies in memory as well. This never proved to be beneficial especially when the queue length was large.

https://www.rabbitmq.com/quorum-queues.html#:~:text=Quorum%2....

aeyes 1436 days ago

You are correct, nice to see that this was improved.

I should find time to run some tests with quorum queues, this now actually looks usable. But in the end we will have to see how stable it is running production workloads.

dalyons 1436 days ago

interesting, i have the opposite experience. When we got to high throughput in a cluster we had all sorts of crashes, partitions, and nasty failure modes that required stressful delicate rebuilds. It has similar challenges to a clustered M-M relational db. Moved the high volume events to kinesis which was far more reliable for that use case.

At low volumes yeah sure, it just ticks along, and does what you want.

amock 1436 days ago

I also found it to be unreliable.

I only used it at one company, but it was the most unreliable and hard to diagnose piece of our infrastructure. We didn't even have high throughput and I wouldn't use it again without a really good reason.

pram 1436 days ago

God yes, I've seen so many split-brain issues with Rabbit and ActiveMQ (what a baffling pile of crap) over the years.

At least you can tell Kafka to block producers if replicas aren't clean.

jorl17 1433 days ago

This! I tell people to run from RabbitMQ! At a previous job we had serious issues caused by split brain!

florbo 1436 days ago

What's low volume? At one point I used rmq to ingest ~20,000 messages per second of varying length. From Tweets to blog posts, all containing the full content of the activity with metadata. It was with 3 node cluster.. wish I remembered the specs but nothing crazy aside from the SSD IOPs. The one time it fell over was when the consumers were broke long enough to fill up the disks.

dalyons 1435 days ago

around there & upwards. You've listed one of the big problems with rabbit @ volume - inevitably/unavoidably you are going to have consumers go down or so slow. At a high enough volume you're heading for a crash/partition quickly if you cant respond fast enough (where "fast enough" is a time window inverse to how high volume the queue is). Its a crappy failure mode to have a sword hanging over you like that.

other log-based messaging technologies like kinesis, kafka, etc do not care if a consumer goes down & are thus much safer.

florbo 1435 days ago

In my case it would have been an issue regardless of what the queue/pubsub tech was (talking on-prem, not GCP Pubsub or AWS, which would just chug along effortlessly, not care, and take our money), since the entire consumer stack was toast and dumping unprocessed data was a no-no. The real issue there was my manager not having a spine coupled with not allowing my team to do its job autonomously. Even with the wonky setup we had it would have been dead simple to chain additional clusters. Stupid but easy with the automation we had built.

However, adding another Kafka or Pulsar node would have been much easier.

cssanchez 1436 days ago

I've read of RABBITMQ so much, but yet I don't understand what does it actually do? I know it is a 'message broker' but I don't understand what that means. To me it sounds it's a backend for messaging that's easy to integrate with any user account module?

blep_ 1436 days ago

It's messaging in the sense of message passing between processes. You put messages into it when you want them to be processed asynchronously.

They go into queues based on attributes of the messages ("routing keys") matched against rules you set up ("bindings").

Other things pick up messages off of those queues and process them.

When they're done, they acknowledge the message and it's removed from the queue. If they crash, the message doesn't get acknowledged and it goes back to the queue.

enasterosophes 1436 days ago

I think of it is as the nervous system of our infrastructure. Our infra has all these moving parts like schedulers, storage, virtual machines, networking and so on. Rabbit is the thing that all the moving parts use to coordinate with all the other moving parts.

fmorel 1436 days ago

Absolutely. My team uses .NET, so we use the NServiceBus library on top of it, and RabbitMQ has been rock-solid. We never have to think about it, it's just been running for years.

POPOSYS 1436 days ago

Do you have experiences with other software that might be an alternative to RabbitMQ?