Hacker News new | ask | show | jobs
by MichaelSalib 4889 days ago
I think it is extremely hard to reason about data integrity properties in non-ACID systems.

Consider a banking application that uses linked accounts such that transactions fail if the combined balance of linked accounts is below zero. If I implement this in Oracle using Snapshot Isolation (the highest isolation level Oracle offers) in the obvious manner, I'll get silent constraint violations: transactions that should fail will succeed because MVCC can't stop two competing transactions debiting the same account pair (i.e., I have two linked accounts A & B with $100 in each and I launch two simultaneous transactions to transfer $150 away from both: afterwards, I'll end up with -$50 in both or some garbage data).

Now, you can fix if you recognise the problem in advance by doing a select-for-update or changing your schema to materialize the constraint (say by creating a linked-balance table that holds the combined balance for all pairs of linked accounts).

But it is really hard to even notice the problem, especially if you've got a few dozen tables with multiple applications writing to your database (RDBMS advocates insist that this is a good thing). And there are no automated methods for determining when this will be a problem: you just get silent data corruption or silent constraint violations in your extremely expensive "ACID-compliant" (but not really) Oracle database.

2 comments

While I agree that it's a hard problem, I'm not sure I'd agree that it's a hard problem to notice. It's such a common problem for anyone dealing with a relational database that I (and pretty much every engineer I work with) would recognize that sort of a problem immediately, in the same way that I always have app-level threading concerns in the back of my head when I'm working in a language like Java. I just take those sorts of things as a given, and my experience working with other engineers is that anyone else who's had to deal much with SQL has had to think about it as well. You develop a sense of when you need to worry about race conditions and when you don't.

Pretty much every ORM solution has some mechanism for dealing with this sort of a problem, as well. For example, the ORM we've built internally uses optimistic locking in most cases, which at least simplifies the problem space that our engineers have to worry about (i.e. they only have to worry about conflicts that can happen due to updates to two different rows, but any conflicts due to an update to the same row are automatically detected).

I'm also firmly in the "having multiple applications writing to the same database is sheer madness" camp, due to these (and other problems).

Again, I'm not saying that these things aren't annoying to think about, just that I think lots of people are used to thinking about them at this point, and it's a pretty well-understood and often well-communicated problem space with a set of known solution patterns, many of which are well-supported by ORM frameworks.

"MVCC can't stop two competing transactions debiting the same account pair (i.e., I have two linked accounts A & B with $100 in each and I launch two simultaneous transactions to transfer $150 away from both: afterwards, I'll end up with -$50 in both or some garbage data)."

That isn't true. In a pure MVCC world you could detect such issues, specifically that two trxs were opened and they both tried to update the same row. This would generate a conflict and one transaction could just get shot. Snapshot-isolation/MVCC describes this issue.

However what InnoDB/mysql and Clustrix actually do is use MVCC for read-write conflicts and two-phase-locking for write-write conflicts which trivially fixes your proposed problem.

To clarify the example, the two simultaneous transactions are updating different rows: one updates account A's row while the other updates account B's. There is no way for MVCC to recognize this conflict: it will happily corrupt your data.
Sorry, I'm not following. If you have 2 rows storing how much money you have in 1 account then your data isn't normalized and serializability isn't going to solve your problem, if you could be (even) more explicit, I might be able to answer.
I think we're conflating Snapshot Isolation and MVCC (e.g., Snapshot-isolation/MVCC). MVCC is a general concurrency control mechanism, not an isolation level. Coupling MVCC and Snapshot Isolation is like saying "using locks provides serializability," which is not true in general--it depends on how you use the locks.

That said, Snapshot Isolation can lead to anomalies like Example 1.2: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142... If you perform additional synchronization (e.g., 2PL, as you describe), then you can get serializability.

Yes, you're totally correct. I was thinking of SI.
The idea is that each account has a balance. Say I have a checking and savings account: there's one row for each and the bank will kindly let either of them get a negative balance provided the sum of both balances is positive.

You may not care for this schema design, but I think most people would be surprised that Oracle silently corrupts data and violates constraints here.

I'm not sure that people who use relational databases would be surprised that there can be data-level race conditions. For example, if you write code like: 1) User requests transfer in the UI 2) Query the DB to see if the transfer is legal 3) If so, update the database.

you have to think about race conditions even if the database implements serializability. First of all, you have to have enough of a clue to do steps 2 and 3 inside the same database transaction, or else the isolation level doesn't help you in the least. Everyone who deals with an RDBMS has to worry about transaction boundaries all the time anyway to avoid race conditions, so it's not too much of a leap to assume that people who are used to worrying about transaction boundaries will have to have some idea of what kind of problems those transaction boundaries will prevent for them. Most databases in production systems don't do full-on serializability as an isolation level (and if they do, it's too expensive to turn on), so pretty much every engineer's real-world experience is that if they do a query against table A, then an update against table B based on that data, that they can have a race condition and they need to lock A or materialize potential conflicts.

I guess it's possible that my personal experience or the engineers that I work with are not a representative sample, but if you're working with sensitive issues like money transfers you tend to be really, really super-paranoid about it, and there's no way that something like that is overlooked by a halfway competent team.