| RabbitMQ has huge learning curve if you're trying to build a worker queue. First, you'll learn about ack/noack and get the worker ack on success. Then, you'll learn about dead letter queue ... etc for delayed retries. Now, you'll have a topic exchange and a bit hairy routing in place using wildcards. And you mistakenly set dead letter routing key so that expired messages end up in multiple queues (retry queues and actual worker queue ... ). Then you rewrite your service in python and use Celery or something. It's nearly impossible to get RabbitMQ working correctly within few months. And I forgot about HA. Paying for hosted RabbitMQ might be better. But CloudAMQP in particular could be tricky as well. It can run out of AWS IOPS and your production gets hosed. Also setting up monitoring on queue health, shoveling error queues ... etc take time to learn and apply. Be careful about routing keys when you shovel error queue to a topic. |
Back to RabbitMQ though, we run a HA 2 node deployment (just one active writer) and have been for over 3 years, requiring minimal changes or any kind of maintenance whatsoever, has scaled to hundred plus queues, going from some with super high numbers of messages per second, some with only tens of messages per day. Some queues stay low and process fast, others are heavy jobs that get enqueued all at once and generate hundreds of thousands of jobs.
Sure, if you have a service that interacts with disks you should have automated a monitor that cover your IOPS consumption, but I don't see how that's specific to RabbitMQ, you should be doing this for all your instances.
All in all, these are two identic instances, one active, one failover, and in a world of Kafkas and Pulsars and understanding the ins and outs of SQS pricing and capacity allocation, RabbitMQ is a tool that I consider simple to administer and allows me to sleep at night.
Interesting how the same tool can evoke such different reactions, but whatever works - works.