Hacker News new | ask | show | jobs
by graemefawcett 34 days ago
The other cases are the original request is still in flight or never occurred. The former case was explained by the prior comment, one request is processed, the other is returned by 409. The system cares little for which is which and neither should the caller. The latter case is handled by clients retrying until a request is received, at which point one of the other three states takes over.

Whether or not a prior request exists in the system in processed or unprocessed state should not matter in a properly implemented idempotent system, the whole point is that one and only one is processed, and all replicas indicate that they are such.

What you do inside of your boundary to implement that idempotent contract need not be part of the contract and the decision of what primitives to use (locking, content-based addressing etc) are mainly just a question of implementation constraints.

1 comments

> The other cases are the original request is still in flight or never occurred.

I'm not sure what you mean by "in flight". The case I'm asking about is where the original request was received by the server and is being processed--and then a second request comes in with the same idempotency key. The original request has not succeeded, and has not failed--it's still in process. What response does the second request get? I do not see an answer to that question anywhere in this thread.

The answer is "the same thing as every other concurrency conflict between two requests". In modern backend development this is most commonly handled by the database, and the practical result is that (from the client's perspective) the requests will block, and only one will "actually succeed".

Here's a typical example, assuming serializable isolation in a database that uses optimistic concurrency.

* Two simultaneous requests come in to create a payment.

* The requests provide an idempotency key that is expected to be unique (possibly scoped to a tenant).

* The first request starts a transaction and starts processing, everything looks good - no dups.

* The second request starts a transaction and starts processing, everything looks good - no dups.

* The first one commits and returns success.

* The second tries to commit, but a conflict is detected (the first txn committed first). Typically this causes the second transaction to retry.

* On retry, the second transaction detects the duplicate.

The only question here is what happens when the second transaction fails? The Stripe model is "look up the original response and hand that back to the client". An equally valid and much easier to implement solution is "return a response that tells the client that there was a conflict".

Both solutions offer "create payment" as an idempotent operation.

> The second request starts a transaction and starts processing, everything looks good - no dups.

So when the second request comes in, even though it has the same idempotency key as the first request, the server doesn't check to see if there's already a request received with that idempotency key?

That would seem to defeat the whole purpose of idempotency keys.

> On retry, the second transaction detects the duplicate.

So at this point, the second request would return a 409 code (or something like that) to the client?

There are a lot of ways to implement this, so I posted an example with one of the most common ways - a database which uses optimistic concurrency in serializable isolation level. Postgres is often configured this way, though it's not the only way it can be configured.

With optimistic concurrency models, collisions are only detected at commit time. Two transactions can simultaneously update the same data; each update will "succeed"; when they try to commit, only the first one will succeed. The second one will fail with a code that indicates a collision. Standard practice is to just retry the transaction.

In serializable isolation, every transaction sees the state of the database frozen in time at the start of the transaction. They don't see each other's writes (that would be "read committed"). So if you have two transactions simultaneously which do "check if value XYZ exists; if it doesn't exist, insert it" they will both run the insert. The collision will only be detected when the second transaction tries to commit.

There are many other ways to implement this, but this is a pretty common approach.

>> On retry, the second transaction detects the duplicate.

> So at this point, the second request would return a 409 code (or something like that) to the client?

Yes. Stripe's approach is not fundamentally different; they just lookup the original request and return that response body instead of returning an error. It's more work for the server side engineers (and has a bunch of complex but obscure failure modes) but all the underlying database behavior is the same.

> There are a lot of ways to implement this

Sure, I get that. What I don't get is why you would be using idempotency keys as part of the implementation if you're going to go ahead and start a second transaction when you get a duplicate request, and not even check the idempotency key, and let your database tell you you've got a duplicate when you try to commit the second transaction. This subthread is specifically about implementations that use idempotency keys, since that's what the article is about.

Update: stickfigure basically answered this in another subthread: as I understand the answer, it is that you do check the idempotency key, but inside the DB transaction, so you have to start a new transaction on every request. If, inside the transaction, your idempotency key check shows that another request with that key was already received, you don't do anything to change the DB inside the transaction and just commit it as a no-op.
"a database which uses optimistic concurrency in serializable isolation level. Postgres is often configured this way, though it's not the only way it can be configured."

It's not the default (read committed is) and I never saw serializable being set in actual production systems. You can do it, but then you have to be able to retry all of your transactions, including read.

What if the task you do take 5 minutes? 30 minutes? 10 hours? Do you create long transaction, blocking all reads?

> It's not the default (read committed is) and I never saw serializable being set in actual production systems.

It's not the common mode of deployment, but it's definitely in prod use.

> You can do it, but then you have to be able to retry all of your transactions, including read.

Pure read transactions shouldn't need to be retried in postgres due to serialization errors. You need to have read-write dependencies for that.

That's not to say that effectively read only transactions aren't affected by serializable, you do need to record the necessary metadata for the serialization logic to work.

FWIW, if you know your transaction is read only and long running, you can start a transaction with START TRANSACTION READ ONLY DEFERRABLE, which makes the start transaction slower, but then does not need to do any work related to serializable while the transaction is running.

> I never saw serializable being set in actual production systems

Every major prod system I've worked on in the last 15 years ran in serializable, including my current charge which processes tens of billions of dollars annually. YMMV but this is quite common in serious production systems. Google's Spanner only runs in serializable.

It doesn't matter though. I could write the sequence out with a SELECT FOR UPDATE and the second request will block instead of retry. The client experience is the same; the "second" request blocks. @pdonis wanted an example so I picked one.