|
|
|
|
|
by Uberzi
1780 days ago
|
|
I don't get this ... Why is it the broker's responsibility to make sure the publishers are not duplicating ? Rather than fixing the code and fix the root cause ? With this feature, the producer still needs to implement a strictly increasing sequence value, make sure its state is persisted, and use that as the publishing ID. What if you need to restore your DB after a crash, and need to reprocess data to catch up ? If you stored your publishing ID sequence in that DB, your "smart" broker will happily drop these messages ... !? Unless there's something I don't understand, that sounds like a really bad "good idea". |
|
Imagine a client sending messages to RabbitMQ with some retry policy if it doesn't receive the ACK from the broker. If the client sends the message and it doesn't receive the ACK in the terms it has been defined (maybe a timeout of 30s), it will retry to send the message, as the consumer assumes that the message hasn't been received by the brokers, but it could be that the ACK back to the client is the one that failed to be sent to the client. The brokers actually saved the message, and they stored it, but the client doesn't know, so it retries the message.
If you don't have some control over the messages, this retried message is _new_ to RabbitMQ, so it will store it and send back the ACK. Maybe this time it is successful and no other retries are made.
With this scenario, the brokers would have received the same message twice. By adding this kind of control (Kafka does more or less the same by discarding messages with already processed IDs when configured as exactly-once) you can try to avoid duplicates. Of course it is limited by memory and it is not in fact exactly-once semantics, so they are calling it now _effectively exactly-once semantics_, as it is more precise.