| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ztorkelson 2948 days ago

In my experience, many useful invariants cannot be trivially expressed in a singular "target field" against which one may lock, but that's actually beside the point.

The point is that--insofar as it comes to correctness--serializability allows local reasoning, whereas weaker isolation levels require global reasoning. You can't argue against that by saying "well, if every transaction just explicitly locks the appropriate records at the appropriate times …" because that's exactly the type of global reasoning that we're trying to avoid.

There certainly is a difference between theory and practice here, but I don't think it's in favor of weaker isolation levels.

In theory, as you note, one can enforce any invariant–even in Read Committed–by taking a global lock in every transaction. In practice, nobody does that, because it takes concurrency to zero.

In theory, one can regain concurrency by judiciously breaking that global lock into some application-specific hierarchy of locks, and by having the discipline to adhere to that bespoke locking protocol for every transaction, now and in perpetuity. In practice, that is both too expensive and too error prone to be applied successfully in any non-trivial system (though some do still try).

In theory, these concerns can be addressed at the application level. In practice, it is grossly irresponsible that we (as an industry) continue to foist these concerns on application developers who–in the aggregate–cannot reasonably be expected to get it right. The number of viable database systems is small and growing slowly, while the number of database-backed applications is large and growing rapidly. To have any hope of improving the status quo, well: I know where I'd put my money.

1 comments

trhway 2948 days ago

>In my experience, many useful invariants cannot be trivially expressed in a singular "target field" against which one may lock

just trivially dedicate/associate a field with each invariant :)

>The point is that--insofar as it comes to correctness--serializability allows local reasoning, whereas weaker isolation levels require global reasoning. You can't argue against that by saying "well, if every transaction just explicitly locks the appropriate records at the appropriate times …" because that's exactly the type of global reasoning that we're trying to avoid.

The reasoning required is the same in both cases. To avoid serialization conflicts you still have to identify all those appropriate records and appropriate times in your reasoning. So the choice is either to explicitly implement required locking for RC or let the SERIALIZABLE do it implicitly under the hood (and hoping that it will do it better :).

>one can enforce any invariant–even in Read Committed–by taking a global lock in every transaction. In practice, nobody does that, because it takes concurrency to zero.

in naive (on practice of course it is more complex and better as result) implementation of what i said the worst case concurrency is the number of disjoint set of invariants. And SERIALIZABLE would do about the same concurrency or worse.

>In practice, that is both too expensive and too error prone to be applied successfully in any non-trivial system (though some do still try).

in my experience any non-trivial system wouldn't run in SERIALIZABLE without hitting the conflicts until the system gets subjected a lot to that "global reasoning" that you mentioned. And flushing all those cases is very time and effort consuming because it is "global" reasoning about implicit locking and relations. The explicit lock based model for RC is more simpler mode of thinking and thus much more practicable and trackable. And this is why we all run RC. At least it runs and if some reasoning was put into it - it runs correctly, fast and scalable.

ztorkelson 2948 days ago

The reasoning required is the same in both cases. To avoid serialization conflicts you still have to identify all those appropriate records and appropriate times in your reasoning.

Well, sort of. Serializability allows you to reason locally for correctness, but not for performance. That is, however, a meaningful distinction.

Serializability remains a marked improvement over the status quo. Most operations aren't actually performance sensitive, so that tradeoff makes perfect sense. With serializability, we need only focus our efforts on those operations which are both performance sensitive and highly contended.

This has nice emergent properties, because the problematic transactions are immediately apparent: they are the slow ones, the ones that are not meeting their performance budget.

Performance anomalies are directly measurable and immediately visible. Correctness anomalies are generally neither.

Performance anomalies are generally localized and ephemeral. Correctness anomalies are often persistent and viral.

So the choice is either to explicitly implement required locking for RC or let the SERIALIZABLE do it implicitly under the hood (and hoping that it will do it better :).

Again, when it comes to correctness, there's no real hope to do better than serializable. When it comes to performance: sure. But in every application domain I've worked, getting to a wrong result faster is rarely useful. (I understand that this is not a universal truth, but I believe it is significantly more common than the alternative. The adage "first make it work, then make it fast" comes to mind.)

in my experience any non-trivial system wouldn't run in SERIALIZABLE without hitting the conflicts until the system gets subjected a lot to that "global reasoning" that you mentioned.

I agree that there have been (and still are) practical impediments to deploying large-scale serializable systems with incumbent technologies. But I disagree that those impediments are inherent to all implementations of the serializable isolation level, which is why these kinds of articles (and the R&D behind them) are so important.

We still have a long way to go before serializable-by-default is a tenable option for the most demanding of systems, but that is what we should be working towards.