Hacker News new | ask | show | jobs
by erhaetherth 801 days ago
I'm having trouble with the example given.

If `UPDATE player SET level = 'AA' WHERE team = 'Gophers';` is executed before the player swap, then why should "Stonebreaker" be upgraded to "AA"? I'd be pretty mad at my database if I sent those 2 queries in sequence and my DB decided to re-order them.

The sleep actually really complicates things here. I understand some queries run slower than others and the sleep is a useful tool to artificially slow things down, but now I don't know I don't know if I should interpret that as one command or two. If `WITH sleep AS (SELECT pg_sleep(5)) UPDATE player SET level = 'AA' FROM sleep WHERE team = 'Gophers';` is atomic then I'd expect it to put a lock on the 3 Gophers (which doesn't include Stonebreaker), wait the 5 seconds and then complete the update. The player swap would be blocked for those 5 seconds because it touches a row that's being updated.

2 comments

The timing of this example is tricky because the two update statements execute concurrently (which is only possible under read committed isolation; under serializable isolation it's much more like what you're describing).

Here's a full timeline in PG (U1 for first update, U2 for second update):

0. U1 begins executing, establishes read snapshot, starts pg_sleep(5).

1. U2 runs to completion.

2. U1 wakes up after 5 sec, scans `player` using snapshot from step 0.

3. U1 filters `team = Gophers`, gets 4, 5, 6.

4. U1 locks 4, 5, 6.

5. U1 performs EvalQualPlan: re-scans latest version of those locked rows, which sees U2's write to 4 but not to 3.

6. U1 performs EvalQualPlan: re-filters those locked rows using latest version, gets 5, 6.

7. U1 writes new versions.

CRDB is easier to reason about: after U1 wakes up from the sleep, it sees that it conflicts with U2 and simply retries the entire statement.

> If `UPDATE player SET level = 'AA' WHERE team = 'Gophers';` is executed before the player swap, then why should "Stonebreaker" be upgraded to "AA"? I'd be pretty mad at my database if I sent those 2 queries in sequence and my DB decided to re-order them.

It’s easy to miss but in the swap query the levels also get swapped. Because — and it’s harder to miss but easy to skip over — given what constraint 2 says the level is actually a team level, not a player level.

So in a seqcst view, either the team’s players get upgraded to AA then the players’ levels get swapped during the exchange, or the players get exchanged then the team’s players get upgraded.

In both sequences you end up with Stonebaker as an AA gopher and Lamport as an A dolphin.

> given what constraint 2 says the level is actually a team level, not a player level.

Then it's bad unnormalized data design that is the problem here. If that is a team level, it should be in the team table, not the player table.

> Then it's bad unnormalized data design that is the problem here.

That the table is not normalised makes the example somewhat confusing but it does not actually affect the issue being demonstrated. And denormalisation is a fact of data modelling.

Why not simply use an example that isn't confusing? Many developers, and especially academics, love wasting effort and time on solving issues they made up but that have no real life examples. When using examples that have trivial alternative solutions, it does not help me as a reader to distinguish whether this is a real problem, or something made up.
Because every example you can think of is confusing, because understanding how concurrent transactions should operate is confusing.

Almost all problems like this can be solved by improving an application data model, but here’s the thing, lots applications have dodgy data models, either due to time constraints, or because the application evolved over time, and the data model didn’t. So these are all real world problems and examples, but creating a “simple” problem to demonstrate the issue almost certainly means also creating an example where other “obvious” solutions exist.

Just because you can’t imagine how this simple example might represent a much more complex “real world” problem, doesn’t mean it doesn’t exist.

> If that is a team level, it should be in the team table, not the player table.

It's all contrived, of course, but the reason I would consider skill level to be a player attribute rather than a team attribute is that there could be free agents with a skill level but no team:

INSERT INTO player VALUES (10, 'Pavlo', 'AAA', NULL);

Then with enough free agents, you could imagine building a new team out of free agents that are all at the same skill level:

UPDATE player SET team = 'Otters' WHERE level = 'AAA' AND team IS NULL ORDER BY id LIMIT 3;