Hacker News new | ask | show | jobs
by nuttingd 918 days ago
One caveat to serializable transactions in Postgres is that ALL concurrent transactions must be running with the SERIALIZABLE isolation level to protect against serialization anomalies.

This is a bit jarring if you come from MSSQL, which implements the SERIALIZABLE isolation level using locks. In MSSQL, you can rest assured that a serializable transaction will not be affected by changes from other concurrent transactions, regardless of their isolation level.

In Postgres, you may have a set of transactions all participating in SERIALIZABLE isolation today, but tomorrow someone adds another script without the SERIALIZABLE isolation level, and now your protected paths are no longer isolated.

4 comments

We use Sybase SQLAnywhere at work, which also implements SERIALIZABLE using locks. Naive me thought that meant a lock on the table, but no, it locks all the rows... Not great for a table with many rows!

We were essentially trying to avoid inserting the same value twice, so we ditched SERIALIZABLE and instead added a unique index along with a retrying loop on the client side.

Or from the other perspective of the trade-off: One caveat with MSSQL is that ALL concurrent transactions must pay the overhead if _some_ transactions need serializable guarantees?
Only if they touch the same data. If they are touching disjoint sets of data then there is no overhead to be paid by non-SERIALIZABLE transactions.
There has been some recent improvement to locking behavior:

https://learn.microsoft.com/en-us/sql/relational-databases/p...

Oh that's super nasty, is it mentioned somewhere in the doc?

Is it the same for repeatable read?

I have read the docs plenty of times, but it never stuck for me until I read the (free!) PostgreSQL 14 Internals ebook: https://postgrespro.com/community/books/internals

Quoted from Page 70:

If you use the Serializable level, it must be observed by all transactions of the application. When combined with other levels, Serializable behaves as Repeatable Read without any notice. So if you decide to use the Serializable level, it makes read sense to modify the default_transaction_isolation parameter value accordingly -- even though someone can still overwrite it by explicitly setting a different level.

I had a real "WTF?" moment when I read this the first time.

Interesting, this book looks really cool!
It is mentioned in the doc, but can be easy to mis-understand:

"If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a serialization_failure error."

Note that it says nothing about the non-serializable transactions.

https://www.postgresql.org/docs/current/sql-set-transaction....

It is in the Wiki: https://wiki.postgresql.org/wiki/Serializable Any transaction which is run at a transaction isolation level other than SERIALIZABLE will not be affected by SSI. If you want to enforce business rules through SSI, all transactions should be run at the SERIALIZABLE transaction isolation level, and that should probably be set as the default.

I guess it is the same for all MVCC databases. They don't want to acquire a read lock just in case another transaction is in Serializable

The OP mentioned that SQLServer does lock (but then, it doesn't use SSI - just SI).

The recommendation in PG docs to use a combination of SERIALIZABLE and READ ONLY transactions seems like a good one for read-heavy systems.

This is pretty intuitive when you think about predicate locks that Postgres uses to detect conflicts.

If you have one SERIALIZABLE transaction that sets some locks, and one non-SERIALIZABLE that doesn't, then they can't "see" each other "by definition".

But your point stands--there could be some kind of "warning flag" somewhere, that would alert if SERIALIZABLE transactions overlap with non-SERIALIZABLE ones. Or maybe there _already_ is something like that??