Hacker News new | ask | show | jobs
by kerblang 2610 days ago
What I'm talking about here is not academic database technologies in laboratories, but the typical mysql/postgres/oracle/sqlserver production world regular programmers live in. So yes, I know pragmatic strategies for avoiding deadlocks, but in all the databases I've used, you have to proactively employ such a strategy. Many if not most of these programmers have at best a rudimentary knowledge of transactions and/or whether they are even using them (thanks to ORMs). My concern is that someone reads a blog like this, concludes that serializable is best, and proceeds towards disastrous consequences.
4 comments

You appear to suggest that programmers with “rudimentary knowledge of transactions” should prefer lower isolation levels which sacrifice correctness for performance. If anything is “grossly irresponsible” here, it’s that.

Such isolation levels are notoriously difficult to reason about—even for experienced practitioners—and their misuse can and does introduce persistent data anomalies that can be costly to remediate.

Generally speaking, performance issues are significantly easier to diagnose and resolve than data anomalies, and they may be address in a targeted fashion as the need arises.

There’s no substitute for thinking. But if I had to prescribe general advice, it’d be this:

(1) When given the choice, select a modern database system which supports scalable and efficient serializable and snapshot transaction isolation levels.

(2) Use serializable isolation, by default, for all transactions.

(3) In the event that your transactions are not sufficiently performant, stop and investigate. Profile the system to identify the bottleneck.

(4) If the bottleneck is contention due to the transaction isolation level, stop. Assess whether the contention is inherent or whether it is incidental to the implementation or data model.

(5a) If the contention is incidental, do not lower the isolation level. Instead, refactor to eliminate the contention point. Congratulations; you are now done.

(5b) Otherwise, lower the isolation level—only for one or more of the transaction(s) in question—by a single step. Carefully assess the anomalies you have now introduced and the ramifications on the system as a whole. Look for other transactions which could intersect concurrently in time and space. Implement compensatory controls as necessary to accommodate the new behavior.

(6) Repeat only as necessary to achieve satisfaction.

As others have commented, most widely deployed DBMSs provide deadlock-free serializability. FaunaDB is a commercially available implementation of Calvin that makes deadlock free serializable transactions performant in a global environment. The technology exists, and I think your concerns are unwarranted. The consequences of performance issues are far less severe than data correctness bugs, so preserving correctness is a much better starting point.
In my experience, novices generally stick with the default. And when they don't, their code breaks because they assumed serializable was somehow magic and they don't handle the inevitable transaction failures (generally on production, when there is load). This is exactly why PostgreSQL drivers that used serializable as a default had to backtrack and default to read committed. And yes, I agree that the blog is encouraging serializable without adequately pointing out the downsides; th extra burden on application developers would be #1 in the caveats in my book, and using serializable when you don't have to will lead to a net increase in bugs.
Those databases all allow and detect deadlock as an optimistic concurrency strategy out of the box.