Hacker News new | ask | show | jobs
by arjunnarayan 2949 days ago
> Postgres is a notable exception

Did you have good experiences with postgres' serializable mode? When I tried to do some TPC-C benchmarking with postgres set explicitly in serializable mode, it would fall over almost instantly (read: not able to get beyond ~100 warehouses). I'd love to read anything you have to say about getting good performance out of Postgres in serializable mode, because I was unable to find this promised land myself.

2 comments

I have some experience with that specific question, although it was a few years ago...

Do you remember what conflict was causing things to fall over?

In general, the usual important parameters to tweak are:

- max_pred_locks_per_transaction may need to be increased; otherwise locks will switch to coarse granularity to save room in the lock table

- for tables that fit in memory, the planner may choose a sequential scan even when an index scan is available, which can be faster but creates more conflicts on a serializable workload. Increasing cpu_tuple_cost should avoid that (or even just enable_seqscan=off to force indexes whenever available)

I actually don't have much hands-on experience with Postgres, so I can't speak to how it performs in practice.

My comment was highlighting that Postgres' implementation of SSI is at least a better starting point than most other purely pessimistic implementations of the serializable isolation level. It does not surprise me, however, to hear that there are still performance deficiencies in practice. For example, the Postgres implementation is imprecise, which will result in spurious transaction aborts (those will presumably be retried, but it comes at a cost to latency and throughput). And even though reads are optimistic (which will perform better than pessimistic concurrency control, if contention rates are sufficiently low), maintaining transaction dependency metadata can still have a non-negligible cost.

I also think Postgres suffers from other more dubious choices--like a lack of clustered indexes--which can exacerbate things.

My work in this area has focused on revisiting various architectural and implementation design decisions which have contributed to the current state of affairs. I think the relational data model is generally the right choice, but most relational databases have some godawful characteristics around performance, scalability, reliability, and programmability which limit their efficacy in practice.