Hacker News new | ask | show | jobs
by lifepillar 1744 days ago
One thing PostgreSQL would likely not be able to adapt to, at least without significant effort, is dropping MVCC in favor of more traditional locking protocols.

While MVCC is fashionable nowadays, and more or less every platform offers it at least as an option, my experience, and also opinions I have heard from people using SQL Server and similar platforms professionally, is that for true OLTP at least, good ol’ locking-based protocols in practice outperform MVCC-based protocols (when transactions are well programmed).

The “inconvenient truth” [0] that maintaining multiple versions of records badly affects performance might in the future make MVCC less appealing. There’s ongoing research, such as [0], to improve things, but it’s not clear to me at this point that MVCC is a winning idea.

[0] https://dl.acm.org/doi/10.1145/3448016.3452783

9 comments

I sort of want the opposite. Except for extremely high velocity mutable data, why do we ever drop an old version of any record? I want the whole database to look more like git commits - completely immutable, versionable, every change attributable to a specific commit, connection, client, user.

So much complexity and stress and work at the moment comes from the fear of data loss or corruption. Schema updates, migrations, backups, all the distributed computing stuff where every node has to assume every other node could have mutated the data .... And then there are countless applications full of "history" type tables to reinstate audit trails for the mutable data. It's kind of ridiculous when you think about it.

It all made sense when storage was super expensive but these days all the counter measures we have to implement to deal with mutable state are far more expensive than just using more disk space.

If the old versions of records stay where they are, they will start to dominate heap pages and lead to a kind of heap fragmentation. If the records are still indexed, then they will create an enormous index bloat. Both of these will make caches less effective and either require more RAM or IOPS, both of which are scarce in a relational db.

You probably need a drastically different strategy, like moving old records to separate cold storage instead (assuming you might ocassionally want to query it. Otherwise you can just retain your WAL files forever).

Absolutely ... it needs to be designed from the ground up - why I think it fits the question of something a modern database could do that Postgres would struggle with.
> moving old records to separate cold storage

FWIW this is available in ClickHouse (which is an analytics database, though)

https://clickhouse.tech/docs/en/engines/table-engines/merget...

You should check out dolt, does exactly what you're describing, and is a drop-in MySQL replacement:

https://github.com/dolthub/dolt

The downside of "good ol' locking" is that you can end up with more fights and possibly deadlocks over who gets access and who has to wait.

With Postgres/Oracle MVCC model, readers don't block writers and writers don't block readers.

It's true that an awareness of the data concurrency model, whatever it is, is essential for developers to be able to write transactions that work as they intended.

It’s not that conflicts magically disappear if you use MVCC. In some cases, PostgreSQL has to rollback transactions whereas a 2PL-based system would schedule the same transactions just fine. Often, those failures are reported ad “serialization errors”, but the practical result is the same as if a deadlock had occurred. And Postgres deadlocks as well.
This is already being developed. zheap [1] is based on the undo/redo log system that databases such as Oracle use, and will be one of the options once Postgres supports pluggable storage backends.

[1] https://www.cybertec-postgresql.com/en/postgresql-zheap-curr...

I don't know. Lock-based concurrency control has problem scaling up concurrent access among readers and writers for read consistency. Of course Oracle with MVCC still beats everybody performance wise.
Does "more traditional locking" mean that the clients request locks? Isn't it already offered by PG with table-level and row-level locks, and also in an extremely flexible way (letting the client developer define the semantics) by pg_advisory_lock?

https://www.postgresql.org/docs/current/explicit-locking.htm...

Having time travel capabilities (eg. cockroachdb) is a really useful side effect of MVCC though. Postgres once had this capability. Need for garbage collection/vacuuming is a downside.

I think it all depends on the pattern of reads, writes, and the types of writes. Mysqls innodb is often faster than postgres but under some usage patterns suffers from significant lock contention. (I have found it gets worse as you add more indexes)

Personally I wish Postgres would add support for optimistic concurrency control (for both row-level and "predicate" locks), which can be a big win for workloads with high throughput and low contention.
> good ol’ locking-based protocols in practice outperform MVCC-based protocols

Doesn't make sense to me. Oracle supports MVCC. SQL Server doesn't scale as well as Oracle.

There is absolutely no reason at all that using a Model View Controller architecture should say anything about your persistence layer. Model View Controller is an abstraction for managing code that generates things that are put on a screen. It says that you should have code that represents underlying data separate from code that represents different ways to show the data to the user and also separate from code that represents different ways for the user to specify how they would like to view or modify the data.

The Model portion of MVC should entirely encapsulate whether you are using a relational database or a fucking abacus to store your state. Obviously serializing and deserializing to an Abacus will negatively impact user experience, but theoretically it may be a more reliable data store than something named after a Czech writer famous for his portrayl of Hofbureaucralypse Now .

What's being discussed is MVCC, or multiversion concurrency control

https://en.wikipedia.org/wiki/Multiversion_concurrency_contr...