| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ztorkelson 2948 days ago

Thanks for the citation root!

I agree, in spirit, with all three of his hypotheses for why the situation doesn't seem to be as dire one might otherwise expect. Peter writes that "it's possible data is actually corrupted, and apps don't care". That's true, of course, but it's also that the corruption is not always prominently visible.

Historically, improper concurrency control (e.g. in multithreaded programs written in unsafe languages) could quickly result in much more than logical invariant violations. Physical data corruption and memory access violations were (and still are) prevalent, with readily apparent results.

Modern database systems, however, aren't going to crash when you hit a serialization anomaly. They're not going to physically corrupt your data: your numbers will still be numbers, your strings will still be strings, and your foreign keys will still be sound.

We do occasionally bear witness to prominent failures that stem from weak database isolation levels: "oh, look at that cryptocurrency exchange backed by [database du jour], Eve was able to overdraw her account by issuing a bunch of concurrent requests. What amateurs!" and we all point and laugh from our houses of glass.

In truth, it's not the anomaly that's abnormal, it's the visibility. Those kinds of anomalies happen all the time, even to experienced practitioners, but the failure modes will generally be much more obscure and application-specific. Most aren't going to manifest as some exception in your logs, let alone as a top post on HN. You'll only find out about the problem when a ticket comes in saying "hey, the summary and itemized reports don't add up" or "this section is overflowing because the user has 6 email addresses and we told the designers that the application limits it to 5". That's how serialization anomalies usually manifest, and it's a death-by-a-thousand-cuts situation, because each occurrence saps precious time from everyone involved.

We have to do better.

1 comments

foldU 2948 days ago

Agreed! Peter has actually done some really great research into this, finding real vulnerabilities in applications due to insufficient transaction isolation: http://www.bailis.org/papers/acidrain-sigmod2017.pdf

link