Hacker News new | ask | show | jobs
by Dionakra 1789 days ago
Exactly-once semantics is virtually impossible to achieve, there is a lot of literature all over the internet, but an easy example could be the following.

Imagine a client sending messages to RabbitMQ with some retry policy if it doesn't receive the ACK from the broker. If the client sends the message and it doesn't receive the ACK in the terms it has been defined (maybe a timeout of 30s), it will retry to send the message, as the consumer assumes that the message hasn't been received by the brokers, but it could be that the ACK back to the client is the one that failed to be sent to the client. The brokers actually saved the message, and they stored it, but the client doesn't know, so it retries the message.

If you don't have some control over the messages, this retried message is _new_ to RabbitMQ, so it will store it and send back the ACK. Maybe this time it is successful and no other retries are made.

With this scenario, the brokers would have received the same message twice. By adding this kind of control (Kafka does more or less the same by discarding messages with already processed IDs when configured as exactly-once) you can try to avoid duplicates. Of course it is limited by memory and it is not in fact exactly-once semantics, so they are calling it now _effectively exactly-once semantics_, as it is more precise.

1 comments

"Exactly-once semantics is virtually impossible to achieve".

Impossible to achieve for a broker, yes. Combination of at least once delivery everywhere upstream and deduping in the consumer gives that consumer exactly once processing.

Lots of people don't grok that and they can be duped into buying certain products with claims that they offer "exactly once". Kafka does or used to do this, SQS (FIFO) also gives this false impression although they are a little more subtle about it. As soon as 1 vendor does it, they all have to do it.

"Did you hear Rabbit doesn't dedupe messages?!"