Hacker News new | ask | show | jobs
by donglitao 809 days ago
The Kafka API and S3 API have each become the de facto standard in the stream processing and object storage domains, respectively. Compared to other Kafka solutions, AutoMQ has embraced them more effectively. I believe AutoMQ has a long way to go.
1 comments

It's a bit unfortunate IMO because Kafka while perfectly adequate for streaming often winds up with someone wanting to use it like a work queue which predictably ends horribly.

Ideally something like Pulsars API (which more resembles something like GCP Pub/Sub) would be the de facto as it is capable of handling both cases seamlessly.

There was Kafka-on-Pulsar until recently until the developers essentially made it non-OSS which is pretty unfortunate for Pulsar adoption which is already low.

Can you elaborate more on the part where using Kafka as a work queue ends horribly? Genuinely interested!
Essentially the problem is that Kafka doesn't have a way of managing acknowledgements on a per-message basis. This means that consumers from Kafka topics are assigned exclusive access on a per-partition basis and the consumer group manager only tracks acknowledged offset of each partition.

As such you end up with a few main problems. The first is head of line blocking, what this means is that if a consumer reads a message from a topic it's unable to process or will take an inordinate amount of time to process it can't move forward without potentially having to replay every message since the problematic message if it doesn't want to risk not replaying a message that wasn't processed correctly. Secondly it means that you can end up with hot partitions if the "cost" of messages isn't uniformly distributed across partitions because load isn't balanced across consumers, i.e there is no work stealing or other mechanism for other consumers to help out processing a hot partition.

Log systems with queue/subscription overlays like Pulsar and GCP Pub/Sub solve this by doing per-message acknowledgement (sometimes referred to as selective acknowledgement vs cumulative acknowledgement that Kafka does) usually by layering a persistent subscription abstraction over the top of the underlying log.

This is in contrast to pure queue systems like RabbitMQ, SQS etc that use a heap or mailbox approach where messages are simply emptied out as they are processed and don't share the log style struction of systems like Kafka.

So TLDR. If you use Kafka like a job queue you will end up in situations where queue processing gets stuck behind a single or patches of unprocessable messages.

The mitigations for it aren't pretty. They either involve building your own selective acknowledgement layer or a series of retry queues that messages are pushed onto using Kafka transactions with a final dead-letter queue at the end etc. Instead either wait for https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A... if you really want Kafka or use something that already does this, i.e Pulsar.

Agree that Kafka is best suited for stream scenarios. However, we also see Kafka being extensively used in online business scenarios.