Hacker News new | ask | show | jobs
by chrsig 1004 days ago
It seems a lot of the complaints weren't about kafka itself, but rather seemed to stem from internal communication problems. Custom kafka message headers could very well be custom http headers, and the problem is the same. Kafka is just coincidental.

Looking at the volume though, kafka is overkill. They most likely could have just used the database and reaped the benefits of doing everything in a single transaction, with easier row level locking. The post acknowledges this.

I do think it highlights the need for a small scale kafka, though. It's conceptually great to have everything work off of logs, but kafka does add a non trivial operational burden.

4 comments

We've had "small-scale" Kafka for a long time. It's an append-only log, and there are a number of ways to implement it, but it's essentially that.

The thing that makes Kafka interesting is the technique of operating from a linux disk write-buffer. That's the trick that makes it fast and scale to huge volumes. But if you don't have the scale, you can stand up a table, or RabbitMQ, or anything that manages append-only ordered log entries. There doesn't need to be a new thing... Kafka was the new thing.

Yes, there's nothing novel about an append only log. What's missing (or unbeknownst to me) is a library or small server that provides a good general purpose implementation.

It's not just a matter of "write to log, done!". Ensuring persistence, keeping track of consumer offsets, transparent compression, waking up consumers on new message availability, support for transactions...

It's not just a append only log that's wanted, it's a system for managing append only logs, without the complications like leadership election, replication, partitioning, etc.

> Looking at the volume though, kafka is overkill.

Overkill in what sense? The blog post seems to suggest Kafka was already pervasive in their organization, and that they leveraged the existing infrastructure and simply added.a couple of topics. How is this overkill?

>small scale kafka, though. It's conceptually great to have everything work off of logs, but kafka does add a non trivial operational burden.

does something like that exist ???

I think it'd be very easy to write your own. I used postgres subscribe/listen built in combined with a database table to get a distributed message system.

Writing a distributed, scalable system is really hard, and beyond the API, that is the real value for kafka

>I used postgres subscribe/listen built in combined with a database table to get a distributed message system.

Every single person I know who's done this says it was a fantastic decision, and the "eventually I'll have to migrate to X" never came.

how do deal with connection limits? you can't listen through a pooled connection.
It's relatively easy -- removing any networking requirements drastically simplifies the problem. There's still some non-trivial bits that vary depending on granularity for concurrency.

It's a weekend project to demonstrate the concept, maybe a few weeks to really flesh it out and iron out quirks. I imagine if you're willing to use sqlite as a backend for persistence, it gets a bit easier.

You may be right. However, consider the directorial perspective.

You have employees - you try to get and retain the best talent you can. However, every human has strengths and weaknesses, and these may not all be fully visible to you.

Rolling your own vs buying off the shelf is a gamble on future outages.

Will a third-party support and fix the issue, or have a strong community that can help you work through the issue?

If your best engineer builds something that works for long enough to become entrenched, but then carks it, will your best engineering talent be able to resolve the issue? If your rockstar quits, does the team have to pick through the halls of Cthulhu? Does your organisational ignorance of kernel networking suddenly become painfully apparent?

Remember, you need to be twice as clever to debug your code than to write it...

Depending on the meaning of "small-scale kafka", both RabbitMQ and redis do support streams.
One of my desires would be for it to be persistent. Hopefully with the option of different storage tiers, so as logs became older they could be moved to less costly medium and transparently fetched when requested.

Having an event sourced system doesn't make much sense unless you maintain messages from the start of the system. You can snapshot state and resume in order to quickly rebuild from a known good state. That doesn't help if there was a logic error corrupting every state from the start, and a full rebuild is required.

I'm unsure how redis streams behave with regard to cache eviction, nor am I familiar enough with rabbitmq to comment on it's behavior. It's been 10 years since I used either, and at the time neither were good solutions for a log based system.

Redis doesn't do key eviction by default. Everything lives forever, unless you tell it otherwise. Of course, it is recommended to turn on some form of persistence (and configure backups) so that things don't disappear if the server restarts.
Not that I'm aware of. I've been very tempted to write my own.
I used bash and netcat for a queue like this once. I stashed the on disk if the database was down and read them back if far end was down.

The think Kafka really brings to the table is trusting the pipe -- in a case where writing to disk queues would occur often enough, I would run into reliability building my own system, instead Kafka handles that type of indexing.