| > So the idempotency key protects against faults on the network between the client and stripe's idempotency layer, but not against stripe-internal faults between the idempotency layer and the application. Is that the case? Yes, in practice this is the case — the idempotency key insertion could succeed and then subsequent queries fail. Practically though, it doesn't happen very often — if a request gets as far as the idempotency layer successfully, the rest of it tends to work too. Where it doesn't, Stripe pays a lot of attention to 500s, especially where those pertain to charges, and a lot of time and energy is spent cleaning up state that might have resulted in an invalid transaction. > Why is the idempotency not achieved by using an idempotent database transaction or atomic operation? The most honest answer is that Stripe wasn't built on a data store where transactions are supported, so it wasn't even a possibility until quite recently, and by then the system was already well established in its current form. Beyond that though, once your requests are making their own requests to modify state in foreign systems (which is happening at Stripe in the form of banks, partners, internal systems, etc.), a single transaction isn't enough to keep things entirely in order anymore because it can't roll back that remote state. It is still possible to build a very robust system that is transaction-based, but it becomes a much more complex problem than a simple `BEGIN`/`COMMIT`. |
It probably happens much more often than you think if you say this as someone with an insider perspective. I have had to integrate with banking systems and payment systems that use this approach, and it is extremely frustrating and comes of as a way of offloading work to the client. If a payment capture succeedes but subsequently always returns 500 an api client has to first query the status and then execute if-then logic for something that could easily just have been a retry of the request (3x the logic). This is acceptable since there are a million other ways to mess up idem potency so integrations kind of end up this way anyway. But the worst part of such an approach is debugging the issues as a client. A client cannot see the internal logs and therefore have to call support which will ALWAYS answer «the transaction seem fine with us» and basically just close the case to protect their KPIs. Im pretty sure this type of issue has a name but cannot find it (state client see dont match the real state). I dont mean to offend the work people put into these APIs, but I cannot see the good qualities of this approach other than saving development hours (and possibly saving one db query, but as an effect you get a status request plus another capture request as a workaround from your clients).