|
|
|
|
|
by stubish
12 days ago
|
|
The article doesn't mention the biggest problem with serializable isolation. At every commit, you need handle the possibility of a serialization exception and retry the transaction. Traditionally devs and frameworks don't, so your application works fine during development and staging but starts failing under load. It makes commit failures normal, rather than an 'oh shit' problem because your disk has filled or someone has tripped over a network cable. And how do you retry transactions? Then you hit another issue when using multiple datastores, where you need to learn about two-phase commit and the joys of manually keeping datastores in sync that don't support it (eg. filesystems). And the locks, if you dare run batch updates along with web requests. The long running transactions lock everything they read, blocking short transactions. Because that is exactly what you asked for. Again, you will miss this during development and only notice under load. So sure, you might avoid some data consistency issues if your data model and update patterns hit the edge cases. In practice, the reason details about serializable are not well known is the cases are rare. Using it gives you safety (maybe that rare case is your case!), but everything needs to be carefully designed around it. |
|
https://docs.yugabyte.com/stable/explore/transactions/isolat...
So they do allow write skew anomalies.
This happens because serializability requires the database to track every read and intersect them with every write, meaning every node in a distributed cluster must be aware of what every other node is doing. That hurts scalability and performance badly; it's one reason Oracle doesn't support true serializability (the other being that almost no existing DB backed apps can tolerate transaction aborts, as you observe).
There are other issues too. With pessimistic locking you can get deadlocks, but if you deadlock your app the database can detect that and get it moving again by rolling one of the transactions back. It surfaces to the user as a bug that they can easily monitor for and diagnose. With optimistic locking+retry, which serializability basically requires, you can get livelock. Livelock is terrible, the cluster appears to be busy doing useful work but just gets slower and slower. There's not good ways to automatically detect it, at least not in any app framework I've ever encountered.
Disclosure: in the Oracle DB group, opinions are my own and not that of the company. I actually argue with my manager about this topic sometimes, lol.