Hacker News new | ask | show | jobs
by boredandroid 3278 days ago
Dear hacker news:

1. Please read the section entitled "Is this Magical Pixie Dust I can sprinkle on my app?" Before making angry comments. Answer: for general consumer apps just consuming messages, no. However, Kafka's design, which let's the consumer control it's position in the log, combined with this feature which eliminates duplicates in the log make building end-to-end exactly once messaging using the consumer quite approachable. For stream processing using Kafka's Streams API (https://www.confluent.io/blog/introducing-kafka-streams-stre...), where you are taking input streams, maintaining state, and producing output streams, the answer is that it actually kind of is like magic pixie dust, you change a single config and get exactly-once processing semantics on arbitrary code. Obviously you still need to get the resulting data our of Kafka, but when combined with any off-the-shelf Kafka connector which maintains exactly-once you can get this for free. So for that style of app design you actually can get correct results end-to-end without needing to do any of the hard stuff.

2. Someone is going to come and say "FLP means that exactly once messaging is impossible!" or something else from a half-understood tidbit of distributed systems theory they picked up on a blog. Let me preempt that. FLP is about the impossibility of consensus in a fully asynchronous setting (e.g. no timeout-based failure detection). Of course as you know the vast majority of the systems you use in AWS or your own datacenter depend in deep ways on consensus. Kafka itself is a CP log, about as close of a direct analog to consensus as you could ask for. Obviously Kafka and all these systems are "impossible" in the same sense that if you can make the network or other latency issue bad enough you can make the system unavailable for writes. This feature doesn't change that at all, it just piggybacks on the existing consensus Kafka does. It doesn't violate any theorems in distributed systems theory: Kafka and any consensus-based system can't work in a fully asynchronous setting, Kafka was a CP system in the CAP sense prior to this feature and this feature doesn't change that guarantee.

For those who want a deeper dive into how it all works there is a longer write up on the design here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+E... The use for stream processing is described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A...

1 comments

Note that you can in fact solve the exactly-once delivery problem using the model of FLP. It is much easier than consensus, because once message receipt is acknowledged it does not has to become a common knowledge.

The only serious objection to the idea of exactly-once delivery is that in some models of computation, you can't even solve it on a single node. It may be impossible to process a message and record the fact that this processing have been done in a single atomic step - making any node failure in-between those steps unrecoverable without reprocessing. This is exactly the objection that was raised in referenced article "You Cannot Have Exactly-Once Delivery". Though I don't think it is particularly illuminating observation, especially that it doesn't have anything to do with distributed nature of the system.