Hacker News new | ask | show | jobs
by blr246 2603 days ago
To implement the serializable isolation level, the database system must track access to every single row you access, even the ones you read or filter out. (The need to track reads, even for rows filtered out of a select statement is surprising but necessary.)

Consider a common scenario where you SELECT a set of rows and take a SUM over a column. Suppose your query and another query begin reading from the same committed state of the same table. Suppose that the other query uses an UPDATE command on a set of rows in the table, and that the other query commits before yours does. In order to be consistent, the database system must detect the situation where the other query updated a row that affects the filter WHERE you scanned the table, otherwise your sum could be incorrect if the other query's committed state would cause your query to compute a SUM over a different set of rows or over modified values in your SUM. The only way for the database system to guarantee there is no conflict is to keep track of every single row your query accesses, even if it is a row passed over by a WHERE clause!

Serializability is a well studied concept. You can find lots of good resources about algorithms for implementing concurrency control and for detecting whether or not two transactions are serializable. The high-level summary is that it takes a lot of operational bookkeeping to guarantee that two queries have no conflicts, especially when you are using real-world examples having many filters and joins.

1 comments

> The only way for the database system to guarantee there is no conflict is to keep track of every single row your query accesses, even if it is a row passed over by a WHERE clause!

Why is putting a lock on the table itself not an option?

You're correct that a table-level lock is a useful tool to ensure serializability, but it's also a drastic one since it blocks other transactions requesting a higher access level. The concurrency control system is trying to provide a generic way to both guarantee consistency and maintain a high throughout on a variety of workloads, and locking tables always would diminish throughout for many common workloads. For example, one where inserts and reads tend to occur simultaneously at high volume.