Hacker News new | ask | show | jobs
by jondot 3515 days ago
To me, Kafka as a job queue is a painful impedance mismatch. To achieve that, you need to:

1. Figure out which Kafka broker you're using. The concept of a consumer and consumer APIs 0.7 is different from 0.8 which is different from 0.9 and different still from 0.10. Ranging from non-existing, to quirky, to finally a good design that's working - but you need to make sure offsets are committed in time.

2. Offset management is a thing. If you're lucky and using recent brokers you're good, but you'll still have to make sure timings for submitting offsets and heartbeats interleave in a way that Kafka doesn't think consumers are dead. If you had problems imagining this scenario - exactly. This was a very hard race condition to find, that's solely up to you and your motivation to fix it and not scratch it of in favor of "a random glitch".

3. Kafka clients are still radically different, supporting different versions. You need to be lucky to use a language and a platform that is in harmony with latest consumer APIs. However - I'm certain it'll converge. The ecosystem will converge slower.

4. Out of reasons Xorlev mentioned, you will find yourself against the wall, making sure each job task is idempotent. Suddenly - this becomes a people management problem too.

All of these can (and probably _will_) be solved, however I feel that (2) and (4) will always be there, because that's part of why Kafka is so great.

In addition, I think Kafka is one product which you _must_ read the "whitepaper"[0] for before you want to build consumers for it. The first reason - because it's an innovative design, that might come in handy in every day life if you're an engineer, and the second reason - is to understand the founding context in which it was created - logs and why there are so many tradeoffs that were done for it to be amazing at that, and to realize that this original founding context was _not_ transactional jobs.

Switching gears now. Many organizations find Kafka as a much needed cure for data processing pipelines, and pushing events and messaging as a first class citizen in the organization from a _data_ point of view. For that, Kafka is amazing. With it, you can realize the dream of having an "event mart" where groups, teams, consume and publish their view of the world, processed, as a message stream, and someone can pick up that stream and build a completely different product on top of it (not a perfect example but one we can all relate to - think about Twitter's firehose).

The perception problem is, that once this floods the organization, there's little to do, to use the same mindset to build _operational_ and transactional queue systems, where you don't process events or data, but perform tasks. Unfortunately that's not true. I'd be happy if there were stronger education about this from Kafka's side.

For the kq project - I wish best of luck and I'd be interested to see it unfold. Code is very clean and I feel it's inviting to just read and learn from it - kudos!

[0] http://www.longyu23.com/doc/Kafka.pdf

1 comments

Thank you for the excellent feedback and insight. I will definitely give the pdf a read. I agree with all of your points, which admittedly I was not fully aware of when I first embarked on this project. As you've already implied, there are some nuances that may forever be inevitable due to the inherent design of Kafka. But I wouldn't want to dismiss it as unsuitable for job queues so early. It would depend strongly on the use case of course (e.g. jobs that are idempotent or without hard requirement to be processed), but as I am hoping that Kafka's API matures further with finer control over messages and that this could work fairly well for the most part. For now I will take note and update the documentation to clearly explain what KQ is (and what it is not), and what the best practises and use cases must be taken into account before using it. Thanks again!