| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by webscalist 2226 days ago

RabbitMQ has huge learning curve if you're trying to build a worker queue.

First, you'll learn about ack/noack and get the worker ack on success.

Then, you'll learn about dead letter queue ... etc for delayed retries.

Now, you'll have a topic exchange and a bit hairy routing in place using wildcards.

And you mistakenly set dead letter routing key so that expired messages end up in multiple queues (retry queues and actual worker queue ... ).

Then you rewrite your service in python and use Celery or something.

It's nearly impossible to get RabbitMQ working correctly within few months.

And I forgot about HA. Paying for hosted RabbitMQ might be better. But CloudAMQP in particular could be tricky as well. It can run out of AWS IOPS and your production gets hosed.

Also setting up monitoring on queue health, shoveling error queues ... etc take time to learn and apply. Be careful about routing keys when you shovel error queue to a topic.

11 comments

vasco 2226 days ago

Celery can be backed by RabbitMQ, not sure if that's what you meant, but all of what you described can be abstracted away. I didn't have the same experiences with months taken to get up to speed. Moreover, at work RabbitMQ is probably our most stable underlying tool, perhaps toe to toe with Redis. And that's saying a lot, since I consider Redis to almost be a piece of art in how great of a tool it is.

Back to RabbitMQ though, we run a HA 2 node deployment (just one active writer) and have been for over 3 years, requiring minimal changes or any kind of maintenance whatsoever, has scaled to hundred plus queues, going from some with super high numbers of messages per second, some with only tens of messages per day. Some queues stay low and process fast, others are heavy jobs that get enqueued all at once and generate hundreds of thousands of jobs.

Sure, if you have a service that interacts with disks you should have automated a monitor that cover your IOPS consumption, but I don't see how that's specific to RabbitMQ, you should be doing this for all your instances.

All in all, these are two identic instances, one active, one failover, and in a world of Kafkas and Pulsars and understanding the ins and outs of SQS pricing and capacity allocation, RabbitMQ is a tool that I consider simple to administer and allows me to sleep at night.

Interesting how the same tool can evoke such different reactions, but whatever works - works.

tankerdude 2226 days ago

You would think, until you get to a split brain issue. The master and failover lose connectivity, and they each then think they're the master.

There's ways to repair it (and it has happened to me one total time in 4 years), but it does happen. I personally try to make my message processing idempotent for the worker to help alleviate these situations.

411111111111111 2226 days ago

haven't encountered it personally, so honest question here: how does a split brain situation become an issue in a message queue?

there are some possible situation from my naive viewpoint:

1. the 'active' queue keeps jumping between, consumers & producers keep reconnecting

=> everything is still consumed, but takes longer as producers write into alternating queues, which are consumed ... albeit slowly whenever the switch happens

2. they're database backed, so they'll try to write into the same table

=> usually software that does this (but cant handle several writers) also creates a `lock` which has to be manually reset before the failover can come up. if its reset, the other node would fail. only one is up, so no issue?

3. producers/consumers dont notice that the 'active' mq changed, and keep running on initial

=> issue manifests as soon as any system is restarted. but only slowly so you got time to handle it with minor service degradation

none of them really sound that bad to me -- but as i said before, i haven't encountered it before, so i might just overlooking something really obvious?

CloudButWhy 2226 days ago

There is a reason why you're supposed to run an odd number of nodes so that you will hopefully have a majority in case of a failure.

yawaramin 2226 days ago

Once every four years sounds like a no-brainer, to be honest.

mythrwy 2226 days ago

I have simple single node deployment and I was floored how easy it was to set up with Celery. Really surprised. I was kicking myself for not using it sooner.

Granted I don't know all the intricacies of RabbitMQ and this was just one step beyond os.popen, but it was painless, like half an hour painless to set up and it has worked really well.

*edit: reading some of the other posts now I'm waiting for the other shoe to drop. but so far it's worked wonderfully.

robbiep 2226 days ago

I also got my first queue set up and running within a reasonable period of time with celery. I have no idea of the internals of RabbitMQ and took longer with celery really (back on python 2.7) but that system has been in prod for 6 years now without really needing any maintenance

m3nu 2226 days ago

Same experience. Single node with a few clients and Celery. Works well.

My main issue in the beginning were network timeouts now and then. Those went away after tuning some TCP settings.

LeonM 2226 days ago

Thank you for this post.

When I first started using RabbitMQ I experienced just about everything you described.

I felt incredibly stupid when a customer would have issues with a queue being stuck or messages that were being dropped, and having no clue on why this was happening.

> It's nearly impossible to get RabbitMQ working correctly within few months.

This is so true. You can get it running in 10 minutes, but it takes weeks of banging your head against the wall and angry customers before you have it running right.

AmericanChopper 2226 days ago

I understand where you're coming from, but what you're describing is learning how to use a queue to maintain consistency guarantees across a distributed system. You can get something simple like AWS SQS working with a few clicks, but then you don't have any of those consistency guarantees.

tracker1 2225 days ago

If you don't need crazy throughput, I find that Azure Storage Queues are crazy easy, built in retry and just simple as can be. Though when I've used it in the past, I've created a slightly simpler to use abstraction.

https://docs.microsoft.com/en-us/azure/storage/queues/storag...

Thinking of doing something that works like an async generator so I can just use it like...

    const work = queue.subscribe('somequeue');
    for await (const {item, done} in work) {
      // do something with JSON.parsed item from message
      await done(); // wrapper for the delete/finish
    }

AmericanChopper 2225 days ago

Azure Storage Queues are about on par with SQS. That is, easy to use, but lacking strict concurrency control. If you need that level of concurrency control (and stricter serialization), then you’d be better off their (more complicated) service bus product [0].

RabbitMQ isn’t more complicated because it’s been improperly designed. It’s more complicated because it’s doing a much more complicated task.

A fair amount of that complexity lies in the hosting, so a managed service can take some of that off your hands (for an increased price obviously), but part of it is necessarily going to lie with the message consumer (your application logic). If your use case doesn’t need that level of control, then it doesn’t need that level of complexity either, so something like Rabbit would just be the wrong choice.

[0]: https://docs.microsoft.com/en-us/azure/service-bus-messaging...

gtaylor 2226 days ago

Depending on your usage patterns, SQS can be significantly more expensive, too.

rlander 2226 days ago

I have my own share of objections, mainly concerning the over-engineered nature of RabbitMQ, but most of the “huge learning curve” items that you’ve described can be learned in an afternoon by a motivated software engineer. Besides, she will have to learn those concepts anyway because they apply to most brokers.

emilsedgh 2226 days ago

You're right. It's difficult to get right. However, it is totally worth it. Once you get it working it just works.

I wish a standard set of higher abstractions existed on it though. Celery, from what I hear fills that gap very well in the python world but nothing like this exists in Nodejs land which leaves the room open for a bunch of redis-backed solutions which are pretty fragile in comparison.

csours 2226 days ago

I'm curious if you are comparing this to a non-queue solution or to a different queue system?

keithnz 2226 days ago

weird, I haven't done much digging in to the details of RabbitMQ, but I integrated it in a matter of hours, and have it deployed in production systems (for quite sometime now) and it works really solidly. I haven't tried to get too clever though.

anon176 2226 days ago

You must of followed a good guide on getting it setup. Took me 2 days to get it solid. Then we decided to just use redis.

keithnz 2226 days ago

I just used the official docs and guides they had on the website, they seemed pretty good to me. I might have googled a few extra things, but can't really remember, I just remember it being pretty straightforward. I remember they pointed out a number of things you had to take care of.

symplee 2226 days ago

Can anyone recommend an easier alternative?

neurostimulant 2226 days ago

I switched to redis since several years ago for simple task queue solution. For my usage (low to medium traffic at most in corporate environment) redis is easier to use and has very little cpu and ram footprint compared to rabbitmq (note that I only use redis for message queue, thus low memory consumption). Never got any message dropped so far. RabbitMQ uses too much memory right from starting up, not ideal for use in a resource constrained server.

https://redislabs.com/ebook/part-2-core-concepts/chapter-6-a...

zmmmmm 2226 days ago

Surprised to see not much mention of ActiveMQ in these comments, but it's an obvious alternative choice. The general (simplistic) comparison being:

- ActiveMQ more featureful, robust default settings, better integrated with Java/JMS but slower

- RabbitMQ faster, simpler, more "just works"

The defaults of ActiveMQ lean more towards robustness (hence often naive benchmarks will tell you it's slow). However in practice it is pretty damn easy to run, you literally can just download the default cross-platform distribution and type `./bin/activemq` and it will start running.

We use ActiveMQ + Apache Camel which makes a pretty nice combo to achieve lots of generalised messaging and routing functionality.

mcsoft 2226 days ago

ies7 2226 days ago

I heard a lot of praise about Nats, but isn't it more like a kafka alternative? Someone new need to spend sometime grasping the stream concept.

mcsoft 2225 days ago

One practical reason we chose Nats over Kafka was that Nats doesn't need zookeeper for HA.

Nats doesn't provide message durability too, luckily it's not required for 95% of our use cases. Also, having NATS already implemented it's a natural move to use NATS Streaming for durability rather than introducing a completely new technology to your stack.

Less pieces - fewer chances something breaks.

jonathanoliver 2226 days ago

NATS by itself is designed to be more of an always-on style queuing system (the term they use is "dial tone") but doesn't handle node failures by itself. If you're looking for a Kafka-flavored NATS, there's a new release I saw recently called LiftBridge that adds some durability to the NATS protocol.

shaklee3 2225 days ago

Someone mentioned this already below, but nats streaming also adds durability.

rhodin 2226 days ago

Disclaimer: I work for CloudAMQP

Yeah we hear you regarding AWS IOPS: for some type of loads and smaller plans we need to offer an alarm + an easy way to scale IOPS. It is something we're working on.

jv22222 2226 days ago

The biggest annoyance I found with RabbitMQ was that it could take up to 10-15 mins to restart if it had a lot of jobs.

This was back in 2015 - might be better now.

kchr 2225 days ago

Sounds like resource (design) issues to me.

ayush--s 2226 days ago

isn't Rabbitmq prone to "split brain" problem on HA setups?