Hacker News new | ask | show | jobs
by jwr 1456 days ago
Some of the explanations are questionable: I think they were overly simplified, and while I applaud the goal, some things just aren't that simple.

I highly recommend reading https://jepsen.io/consistency and clicking on each model on the map. This is the best resource I found so far for understanding databases, especially distributed ones.

2 comments

> Some of the explanations are questionable: I think they were overly simplified, and while I applaud the goal, some things just aren't that simple.

I am an expert on the subject matter, and I don't think that the overall approach is questionable. The approach that the author took seems fine to me.

The definition of certain basic concepts like 'consistency' is even confusing to experts at times. This is made all the more confusing by introducing concepts from the distributed systems world, where consistency is often understood to mean something else.

Here's an example of that that I'm familiar with, where an expert admits to confusion about the basic definition of consistency in the sense that it appears in ACID:

https://queue.acm.org/detail.cfm?id=3469647

This is a person that is a longtime peer of the people that invented the concepts!

Not trying to rigorously define these things makes a great deal of sense in the context of a high level overview. Getting the general idea across is far more important.

I would love the feedback, what was questionable? striking the balance is tough. jepsen's content is great.
Everyone can disagree on what is the precise place to slice "this is beginner content" from "this is almost-beginner content". I could stick my own oar in in this regard but I won't.

I think your level of abstraction is quite good for the absolute "what on earth are people talking about when they use that 'database' word?". With an extremely high level understanding, when they encounter more detail they'll have a "place to put it".

One thing that can be surprising is that for "REPEATABLE READ", not all "reads" are actually repeatable.

There are at least two ways (that I'm aware of) that this can be violated. For example, if you run an update statement like this:

    UPDATE foo SET bar = bar + 1
Then the read of "bar" will always use the latest value, which may be different from the value other statements in the same transaction saw.
Not sure what you're claiming here...

Repeatable read isolation creates read locks so that other transactions cannot write to those records. Of course our own transaction has to first wait for outstanding writes to those records to commit before starting.

Best as I know the goal is not to prevent one's own transaction from updating the records we read; the read locks will just get upgraded to write locks.

> Repeatable read isolation creates read locks so that other transactions cannot write to those records.

No it doesn't, that's just one possible implementation strategy. Postgres for example does not do this.

> Best as I know the goal is not to prevent one's own transaction from updating the records we read;

I'm talking about updates from other transactions. In postgres with REPEATABLE READ, the following transaction can be executed concurrently by two clients:

    BEGIN
    SELECT bar FROM foo WHERE id = 1;  -- Returns 0
    UPDATE foo SET bar = bar + 1 WHERE id = 1;
    COMMIT
Both clients can see a value of "0" from the first SELECT, but after both COMMIT, the value of "bar" will be "2". ie. the "read" of "bar" in "bar = bar + 1" for one of the transactions does not use snapshot isolation.