Hacker News new | ask | show | jobs
by josephg 2593 days ago
I’ve long had the inverse view - I’m not sure what good use cases there are for Rabbitmq that couldn’t be handled better by a Kafka cluster.

One company I worked with used Kafka as their central source of truth across the organisation. All events generated by users were thrown into a massive Kafka cluster. Each team in the organisation cared about a different view into that data (financials, marketing, fraud, what we display to that user on the website, etc). Each team would ingest the same kafka queue and do different things with it - often consuming certain events into their own Postgres instance, or other things like that.

I used Kafka when I made my reddit r/place clone a few years ago because it gives great read and write amplification. With Postgres as a central source of truth, you can only handle thousands of writes per second. And reads will slow down the instance. With Kafka you can handle about 2M/sec. And reads can really easily be serviced from other machines - you can just have a bunch of downstream Kafka instances consuming from the root, and serving your readers in turn.

It may be that you can also solve all these problems with a well configured rabbitmq cluster. But coming from a database world I find it more comfortable to reason about architecture, performance and correctness with Kafka.

2 comments

Size? If you’re getting less than a few hundred events a minute is it worth setting up Kafka?
This is the main reason I don’t use much Kafka in my own projects. I hope at some point someone makes a redis equivalent of Kafka for small projects.

Is Rabbit much easier to set up for small projects? I haven’t used it much.

You might be interest in Redis Streams[1], it's basically Kafka in Redis.

[1] https://redis.io/topics/streams-intro

If you're in AWS, you can use Kinesis which is similar to Kafka. It also ties into a lot of their other offering such as:

* s3 - use kinesis firehose to take the contents of your kinesis stream and time partition it into files for either ingestion into redshift, elastic search, etc... or later batch analysis for ML or just to treat as cold searchable storage with something like Athena

* dynamoDB - spit out the data into kinesis from dynamoDB as it changes to create a change stream used elsewhere in your platform. (dynamo-streams)

* real time analysis - perform real time sql analysis (kinesis analytics) on what's in your stream over a given window of time or data, and react as events/situations occur.

Looking at all the services that amazon has built around kinesis might help you understand some of the differences between kafka and something like RMQ.

Sounds like your org used Kafka for event sourcing. This is almost always a bad idea, event sourcing and aggregate reconstruction is a nightmare IMO.

Kafka used as a pure FIFO cache for regular CRUD endpoints works fine

Event sourcing was one of LinkedIn use cases when they created it, Kafka is fine for all logging needs.
Yes; they did. It worked pretty well actually.

Why do you think it’s a bad idea? Most of the arguments against event sourcing that I’ve read seem to be “yes but the tooling isn’t very good”. That might be true, but maybe we solve that problem with more investment into event sourcing; not less of it.

TLDR the tooling is so bad it's basically impossible to run at scale. I worked for a company that tried. Maybe on a small scale it's fine, but replays and storage of past events takes insane amounts of space at high event rates. To the point that storage costs and replay times became a real problem. (Many terabytes and days)

I also don't think it's a great idea in general. The event stream directly replicates a DB commit log, and the aggregates your tables. It's building your own database.

We had to throw a year's worth of work away at the end so I'm fairly biased against trying it until the ecosystem is better.