Hacker News new | ask | show | jobs
by stickfigure 39 days ago
That's usually solved with traditional database transactions.

Even if you have a complex long-running multistep orchestration problem, you can break it down into simpler transactions. Eg you could start with a "lock the resources" txn.

But 99% of these conversations around idempotence are simple POST operations like "create order" that regular old database concurrency management handles just fine.

2 comments

> That's usually solved with traditional database transactions.

That doesn't answer my question. What response do you return to the client in the case I described?

This is just normal concurrent programming? If two requests come in for the same idempotency key/customer reference id, only one will succeed. Use standard database transaction isolation.

So one will complete with 200, one will complete with 409. It doesn't matter which.

That said, there's something odd about the way you phrased this question. If the original request hasn't gotten a response yet, why is it sending a retry? What you're asking is more general: What happens when two conflicting requests come in? This is something we've been solving with RDBMSes since the 1970s.

> That said, there's something odd about the way you phrased this question. If the original request hasn't gotten a response yet, why is it sending a retry?

Because it hasn't gotten a response yet. That's got to be far and away the most common reason any request gets retried in any context.

So you have to serialize the requests, and have one of them wait for the other to finish to return the 409?

> why is it sending a retry?

may be two clients tries to do it? Or there's a bug with the client in how they do it?

Isn't the point of idempotency meant to enable clients to retry again, without fear that a 2nd request somehow breaking things?

By definition we have to serialize something somewhere so we can decide which request is the success and which request is the duplicate. There's nothing special about the case of retries, this is standard concurrent programming 101. Two conflicting requests come in, which one wins?

You absolutely must wait for one request to finish before any other request can return a 409. 409 is a signal to the client that they can stop retrying, the job is done. If some request returns 409 early and the "original" request fails, you will not get further retries and the message will be lost.

Stripe's approach requires serialization as well. Only one request can succeed. If you send multiple conflicting requests in simultaneously, some of those have to block.

The good news is that we have been solving this problem for decades and we have incredibly well refined tools - database transactions and isolation levels - for solving this problem.

In my opinion, the idea of idempotency is to accept both requests, but only one is actioned (and the requester is non-the-wiser about which). Otherwise, you're just recreating database transactions - something that doesn't need to be named idempotency.

And you haven't considered multiple servers in your scenario - what if two requests meant to be idempotent with each other arrived at different servers?

How would the requester get notified if it doesn’t know which request succeeded? Is it listening for events?

And at the sake of repeating the above commenter, you solve the multiple server by serializing somewhere, because you ultimately need a lock on something. You can also perform the operation in both places and then reconcile the state later but that’s a lot more complex.

> So you have to serialize the requests,

Not necessarily - there are different transaction isolation and conflict resolution methods provided by every database built for this purpose. You just have to ensure that only one request actually commits to the database, and that one sends a success response while the other sends a 409. The database or another lock provider can either help enforce serialization up-front - or the app can use optimistic locks based on data in the request that will only block if there is actually a conflict, and this won't delay the first transaction at all.

Solving these kinds of issues are exactly the purposes of idempotency keys and database transactions and using them in the intended way is really the only sound way to build a distributed system. Making things more complicated to "improve DevX" is just going to make them unsound. That is what Stripe chose to do. Their 24-hour replay idea is fine but why not send 409s after that rather than accept those transactions? If "that will never happen" then the 409s will never happen. It would have cost approximately nothing (if designed that way upfront) and inconvenienced their clients not at all.

> If the original request hasn't gotten a response yet, why is it sending a retry?

Um, because connections over the Internet aren't 100% always on? Because packets can get lost? Because computers sometimes have to reboot?

You're assuming that the client will always receive whatever response your server finally sends, and that the client will wait indefinitely to receive a response. Neither of those things are true. So the client can be in a state where it sends a retry because it got no response and doesn't know why. And that means a retry request could come in while the first one is still being resolved--because the client had a timeout or it rebooted or something else happened that made it lose the connection state it previously had. That's the case I'm asking about.

I'm not assuming anything. Let me try to reframe this for you.

The case of "client sends a retry with the same idempotency key" generalizes to "multiple requests come in for the same idempotency key". These can come in spread out over time (like a traditional loop), or they could come in at once. The solution is the same either way.

The problem of "how do we deal with multiple conflicting requests coming in at once" is something we have been dealing with for decades. We have databases with transactions and isolation levels. If I said in an interview "make an endpoint that inserts a value in a database and returns an error if the value is a duplicate", any competent backend web developer should be able write it without Claude's help. Concurrency is part of our life.

Whether you want to return 409 or replay the success is irrelevant to this question. You must serialize the idempotent operation on the server, because you can have multiple requests coming in simultaneously. If you put the operation in a database transaction with an appropriate isolation level, you are most of the way there.

> I'm not assuming anything.

Sure you are. You said:

"Retries will only receive 409 if the original request was successful. If the original request failed, the server performs the operation as normal on the second request. It doesn't replay failures."

I understand all that just fine; you don't need to keep trying to "reframe" it. But what you said that I just quoted above assumes, implicitly, that if you get a second request with the same idempotency key, the original request has either failed or succeeded--because you don't even address the case where neither of those things are true. I'm asking you to address that case.

If your answer is "that will never happen", I disagree, and I explained why in response to your question about why the client would send a retry if it hasn't received a response to the original request. You could answer, I guess, that you still think that would never happen--and I would still disagree. But at least that would be an answer. So far all you've done is "reframe" something that I already understand and wasn't asking about.

The other cases are the original request is still in flight or never occurred. The former case was explained by the prior comment, one request is processed, the other is returned by 409. The system cares little for which is which and neither should the caller. The latter case is handled by clients retrying until a request is received, at which point one of the other three states takes over.

Whether or not a prior request exists in the system in processed or unprocessed state should not matter in a properly implemented idempotent system, the whole point is that one and only one is processed, and all replicas indicate that they are such.

What you do inside of your boundary to implement that idempotent contract need not be part of the contract and the decision of what primitives to use (locking, content-based addressing etc) are mainly just a question of implementation constraints.

You're asking questions that broadly summarize as: "what if we had an idempotency key [which must work concurrently to be useful], but it didn't work concurrently?"

Idempotency keys are themselves the solution you're looking for. If they don't work concurrently, they aren't idempotency keys. Your response in races or duplicates doesn't inherently matter in that sense, pick whatever semantics make sense for your system.

> You're asking questions that broadly summarize as

No, I'm asking one question, which doesn't seem to be summarized by your summary.

The situation is that your server has received two requests with the same idempotency key. For the first request, one of three things could be true: it could have succeeded, it could have failed, or it could still be in process.

The original post I responded to said what response the second request gets if the first request succeeded and if it failed. But it didn't say what response the second request gets if the first request is still in process on the server--so it hasn't succeeded and it hasn't failed. I do not see an answer to that anywhere in this thread.

If the first is still in progress and a second arrives, you can wait for it to finish (and then return whatever would be useful, e.g. you could replay the first request's response), or fail the second (409, 423, 429, there are a number of codes that could apply) to tell the caller to retry later.

So yeah, you can do basically anything that isn't inconsistent. Success, fail, delay, don't respond until timeout, all are valid as long as you don't double-apply. Most concurrent systems are like this in some way, because all successes can become errors, and all responses might never arrive. It has nothing really to do with idempotency.

The lock would normally make the second request wait, aka not return a response, until the first one is done. Then it sees it's a duplicate and returns that. Or it times out and returns an error. Then the client hopefully have some exponential back off strategy, so the third attempt doesn't suffer the same fate.
If a transaction is locked then subsequent requests would return a 409, ideally with an error message indicating that it's currently being processed.
To be honest, I liked your original response about returning a 409 - it's not something I'd done before and I like how it keeps things simpler.

But your follow up responses here are making me rethink. Now you have to have all these special cases where the original request is still in process. I think or assertion of "99% are simple POST operations" is bullshit. For the times where idempotency is hard and really matters, often times you're calling a third party API, like a payment processing API.

I would think a better approach would be to always return a 409 on a subsequent request, regardless of whether it passed or failed, and then have a separate standard API that lets you get the result of any request by its idempotency key.

Idempotence already requires some client thinking if you want to conform to HTTP specs.

I.e. idempotent DELETE with proper protocol behavior requires that one request see the 200 OK or 204 No Content and the other sees 404 Not Found, because the delete has already happened. It would be misleading to say 200 OK to both, because that answer means the resource was there when the request arrived.

Honestly, the whole HTTP resource model has a different conceptual backing for state management than the independently developed "idempotence" concepts in distributed systems. Those non-HTTP concepts came from more message-based rather than resource-based architectural assumptions.

The cleanest mapping in the spirit of HTTP would be that you do multiple round trips. A POST creates a new idempotence context, a bit like "start a transaction". The new URI is the key for coordinating state change and allowing restart/recovery.

As I remember it, the idea of idempotence keys in headers really came from the SOAP RPC mindset. It's kind of funny to see it persisting in some hybrid SOAP + REST mental model.

> The cleanest mapping in the spirit of HTTP would be that you do multiple round trips. A POST creates a new idempotence context, a bit like "start a transaction". The new URI is the key for coordinating state change and allowing restart/recovery.

I think that gave me "Enterprise Java Beans PTSD". I.e. an over-engineered solution that adds complexity for both the client and server in the name of some sort of "protocol purity".

People bolted on idempotent semantics onto HTTP because it wasn't provided natively by the protocol, so I don't think it makes sense to go through some hoop-jumping gymnastics for the sake of conforming to a spec that doesn't describe the necessary semantics in the first place.

FWIW, I'm not particularly fond of HTTP. But there is PTSD in both directions. Doing random things ignoring (or subverting) "protocol purity" often create disastrous effects when they haven't considered how the larger system will behave when you have various middleware bits that are essentially obeying different protocols while superficially claiming to use an interoperable standard.

When I let myself ruminate, it irks me that we all let HTTP become the defacto "internet protocol" just because of firewalls. Because there was a cargo cult idea that HTTP is benign and so one of few ports allowed almost everywhere, we do stupid contortions to squeeze every protocol through an HTTP tunnel.

These short-sighted acts of laziness accumulate into HTTP everywhere. And of course, the firewall is nearly pointless when "everything" is going through that one hole anyway.

> special cases where the original request is still in process

This isn't a special case, and it's the same problem if you want to replay the original response on conflict. If the original request isn't complete, what are you going to replay?

> If the original request isn't complete, what are you going to replay?

Who says you have to replay? If you get a second request with the same idempotency key, and the original request is still in process, why not just send the client a response that says so?

It's not a terrible question. It would be complicated to implement and isn't more useful than using the database's consistency model, so nobody does it that way.

Long running transactions create all sorts of problems, so transactions are generally expected to be short. The actual work behind "create payment" or "create order" is generally fairly trivial - more or less insert a row in a table. There's no good reason to make the API complicated... you either "win" at concurrency or you lose, and the difference is generally sub-millisecond. The only meaningful thing you need to communicate to the client is "you're done" (for both the win and lose cases) or "you need to try again" (for the "something unexpected went wrong" case).

Complicated workflows can certainly have multiple steps, with "fetch the current status" calls in between. But somewhere near the beginning of every complicated workflow there will be a call to "create workflow" and it will need to have sort of mechanism which allows clients to call it idempotently. Otherwise you end up with multiple starts.

I've literally received duplicate products in the mail because of this kind of problem. I've also sent multiple products in the mail because services I relied on didn't offer the necessary idempotency mechanisms.

> The actual work behind "create payment" or "create order" is generally fairly trivial - more or less insert a row in a table.

It's generally insert a row in someone else's table, over the wire, 50ms+ away. They might not even be using an RDBMS.

That is my point. When you are doing "normal" idempotency where you do the appropriate locking and keep around a table with ongoing request status and the result that you can return on a subsequent duplicate request, you handle all these cases. But in your "409" version of it, you haven't really saved much complexity on the server because you still need to keep around all that info if you're not just returning a 409 if you get a second request while the first is in progress.
I don't understand what you're saying. I can take any cheap rdbms, put a unique constraint on a column, and make my API return 409 for conflict vs 200 on success. There's so little code involved that it's embarrassing to charge money for it.