| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by crazygringo 1204 days ago

Which the author admits three quarters of the way through:

> The way we achieve exactly-once delivery in practice is by faking it. Either the messages themselves should be idempotent, meaning they can be applied more than once without adverse effects, or we remove the need for idempotency through deduplication.

Honestly I don't get why this is "faking it" though. It seems like the author's definition of "exactly once" is so purist as to essentially be a strawman. This is "exactly once" in practice.

Like are there other people claiming that this purist version of exactly-once does exist?

6 comments

nimih 1204 days ago

> Like are there other people claiming that this purist version of exactly-once does exist?

In my experience, the purist version of "exactly-once" exists as a vague, wishy-washy mental model in the brains of developers who have never thought hard about this stuff[0]. Like, once you sketch out why idempotency is important and how to do it, folks seem to pick up on it pretty quickly, but not everyone has trained their intuition to where they automatically notice these sorts of failure modes.

[0] I don't mean this as a slight against those developers--the issues that arise from distributed systems are both myriad and subtle, and if you've spent your time learning how to make beautiful web pages or cool video games or efficient embedded systems, it seems reasonable to not know anything about the accursed problems of hypothetical Byzantine Generals. Or maybe you're fresh out of a bootcamp or an undergraduate program and haven't yet been trained to expect computers to always and constantly fail in every possible way.

link

cowl 1204 days ago

Because both of this "solutions" are not part of the delivery mechanism but part of your problem space. So the delivery system is not guaranteeing even a fake exactly-once delivery, it's you usage that makes it a fake exactly once. What's more both of these solutions are very hard in practice. Idempontency can be applied only on special circumstances when you can design it that way. "Prepare an order" message for example can't be idempotent, it has side effects and it will prepare a new order every time you recieve the message, so you go the deduplication Route by considering the OrderID but if you have several Workers that process these messages how do you handle DeDuplication? if the first worker has never Ack-ed the processing, do you deliver it to a new Worker in the queue? How does the new Worker know if someone else is processing the same OrderID? Central Database? you are only hitting the can down the road...

link

majormajor 1204 days ago

It can be very hard to get idempotency right.

It can get way harder when your initial design made incorrect assumptions about the delivery semantics you were using, so you didn't know you'd need it.

Edit for example:

Someone could have a low-latency problem that seems like it could be a fit for a streaming application. They could look at docs and see "ooh, with Flink I can do exactly-once writes to Kafka" in one place, and choose to use that. But if they don't dig deeply into what that means, they may miss the latency impacts of having to checkpoint every time to commit a set of writes to Kafka. And by the time they figure this out, managing both "low latency" and "exactly once" in the code they wrote might be a really hairy problem.

link

hn_go_brrrrr 1204 days ago

The distinction is how you design. You don't need idempotence with a mythical "exactly once" system. Conversely, when you're debugging a system built on top of "at least once", you need to keep that property in mind in case the bug you're tracking down is lost idempotence.

link

kevincox 1204 days ago

Because idempotence can be very hard to achieve. You usually can't just write the message ID to a DB and ignore messages with a matching ID because if you crash while processing then you need to start over again. But you can't just write it at the end because then all of your processing steps need to be idempotent (so why are you bothering to write the ID?).

I've seen very few systems that have general idempotency baked in. Often it ends up being specific to the application. In some cases you can have simple solutions like upon crashing reload all of the state from an authoritative source. In some cases your messages result in simple idempotent operations such as "insert message with a unique ID" or "mark a message with a unique ID as read" but even then these are becoming quite related to business logic.

Basically idempotency is a powerful tool to create a solution but it is no silver bullet. That is why it is important to understand the underlying problem.

link

pksebben 1203 days ago

reading your comment, it dawned on me; there is a way to theoretically ensure exactly-once delivery.

1. buy plane ticket 2. bring box to recipient 3. plug in Ethernet & send message

keep an eye out for our IPO

link

yencabulator 1203 days ago

That's at-most-once.

link