Hacker News new | ask | show | jobs
by dalyons 1436 days ago
interesting, i have the opposite experience. When we got to high throughput in a cluster we had all sorts of crashes, partitions, and nasty failure modes that required stressful delicate rebuilds. It has similar challenges to a clustered M-M relational db. Moved the high volume events to kinesis which was far more reliable for that use case.

At low volumes yeah sure, it just ticks along, and does what you want.

3 comments

I also found it to be unreliable.

I only used it at one company, but it was the most unreliable and hard to diagnose piece of our infrastructure. We didn't even have high throughput and I wouldn't use it again without a really good reason.

God yes, I've seen so many split-brain issues with Rabbit and ActiveMQ (what a baffling pile of crap) over the years.

At least you can tell Kafka to block producers if replicas aren't clean.

This! I tell people to run from RabbitMQ! At a previous job we had serious issues caused by split brain!
What's low volume? At one point I used rmq to ingest ~20,000 messages per second of varying length. From Tweets to blog posts, all containing the full content of the activity with metadata. It was with 3 node cluster.. wish I remembered the specs but nothing crazy aside from the SSD IOPs. The one time it fell over was when the consumers were broke long enough to fill up the disks.
around there & upwards. You've listed one of the big problems with rabbit @ volume - inevitably/unavoidably you are going to have consumers go down or so slow. At a high enough volume you're heading for a crash/partition quickly if you cant respond fast enough (where "fast enough" is a time window inverse to how high volume the queue is). Its a crappy failure mode to have a sword hanging over you like that.

other log-based messaging technologies like kinesis, kafka, etc do not care if a consumer goes down & are thus much safer.

In my case it would have been an issue regardless of what the queue/pubsub tech was (talking on-prem, not GCP Pubsub or AWS, which would just chug along effortlessly, not care, and take our money), since the entire consumer stack was toast and dumping unprocessed data was a no-no. The real issue there was my manager not having a spine coupled with not allowing my team to do its job autonomously. Even with the wonky setup we had it would have been dead simple to chain additional clusters. Stupid but easy with the automation we had built.

However, adding another Kafka or Pulsar node would have been much easier.