Hacker News new | ask | show | jobs
by alexchamberlain 1004 days ago
Sorry for the naivety/not obvious from the comments: is that too much or too little? (I've used RabbitMQ much more than Kafka.)
10 comments

Kafka is designed to maximize scalability, millions of messages a second. It's a pain in the neck to manage if you don't need it
At 7500 records a day, I'd even question RabbitMQ in the design.

If I was to judge that at work, my first thought would be that one of our busier postgres clusters with 2 read replicas is chugging through some 2-3k transactions per second without really needing much tuning or rather specialized hardware. The more ETL-oriented clusters are capable of processing some 100M - 200M rows per second when chugging through large queries, and these are just simple 4 core VMs again on not really specialized hardware. And postgres would parallelize these queries more if you gave it more cores and the queries aren't horrible.

At 7500 records a day or 3 million a year, you wouldn't be able to generate enough data to make one of these databases sweat over many years.

Hate me as a DBA, but write some good queries for whatever you're doing and run those in a cronjob at that scale.

I once worked on a chat-based system that handled load like this, and it was initially built with kafka. I worked out that the cost per message was several cents, haha. I replaced it with a redis queue, which was all I knew at the time, and it ran on a digitalocean droplet for CAD $5 per month for around 18 months before they scaled it up. It was handling ~100 messages per second peak at the end, which is still very low. The cause for concern was that the droplet silently failed due to memory issues on a particularly busy day, so it seemed reasonable to jump to the next tier to avoid that issue for a while.

For what it's worth I never intended for the 1CPU/1GB VM to go to production, but I was a consultant and they just ran with it. And it worked!

They swore off of kafka forever after the pains they had with it. Another consultant built that system for them, so it wasn't an internal decision exactly and they had no idea what they were getting into. I've heard of similar experiences since. I've sometimes hoped to land on a project where kafka was well suited to the problem, though; I learned a lot about it back then and it seemed incredibly cool. I was kind of envious of all these projects fully utilizing it!

I've used kafka to process data on the order of 50mb to 5000mb / second incoming. It has complexities that are worth eating for that type of use case.

For a message every 10 seconds, its use is ... hmm. It wouldn't be in my top 50 choices.

edit: and to be clear, I'm a huge fan of kafka: it sat there and silently just worked. It was great!

Way too little to justify event-driven architecture (let alone Kafka specifically), unless you have some specialized need like very slow event processing and need to display a "message received" notification to user before the processing happens. Or you really need the retry functionality and can't handle it some other ways.

Most businesses have no (hue hue) business doing event driven architecture. There is way too much overhead for local testing and overall complexity, especially when you want to properly handle errors.

"But, every developer should be able to set up their local." Yea, great, explain to the manual QA who may be amazing but just started 3 months ago.

I think we need to separate load from event driven architecture: what if you want to reload data on a client's view, even if that data only gets refreshed once a month? The load is very low (1 message/month), but still requires an event to be pushed to the client to refresh their data.
You probably want to be closer to 7,500 messages/second before Kafka becomes worthwhile.
It's almost nothing.

I'm currently maintaining a system that uses Redis as a message broker and at a rate of ~10 messages/second it marginally makes sense.

It is using a dump truck to sugar your coffee.
Too little to need Kafka.
too little