Hacker News new | ask | show | jobs
by tjelen 3353 days ago
Here are some issues that I've seen becoming important when scaling to a higher traffic with both MySQL and PostgreSQL (we're running both for different kinds of databases). Some of this is second-hand from my colleagues, so it might not be exact:

* Thread-per-connection (as opposed to process-per-connection with PG) means much lower memory overhead with many concurrent connections; with PG you might need to use something like pgbouncer, which doesn't come without its own issues.

* Logical replication; the MySQL format is compatible between minor release versions (e.g. between all 5.x releases), which enables upgrading individual replicas or master without complete shutdown. With PG this often a becomes way more complex.

* Better caching/buffer management. Essentially, InnoDB has its own implementation of LRU cache while PG relies on kernel page cache. This generally allows finer tuning on the database configuration level and you don't have dive into kernel page cache tuning (which will be always OS-specific).

EDIT: most of this stuff is probably even better described in the Uber article, linked in other comments here.

1 comments

A note on logical replication. That exists in pg as an add-on right now. And will be in base/core when pg10 ships.
Any ETA on the PG equivalent of mixed replication - row based whenever possible, logical when row-based doesn't work?

Logical replication has the problem that missed messages can have really nasty effects on your data down the road.

fwiw, all 3 of MySQL's binlog formats (statement-based, row-based, mixed) are still actually logical replication:

* Statement-based is logging raw write SQL

* Row-based is logical effects of executing that SQL

* Mixed uses statement-based for everything except statements that are non-deterministic or otherwise unsafe for statement-based replication

Physical replication in MySQL would be storage-engine dependent. For example, in InnoDB physical replication would mean shipping the InnoDB transaction logs rather than the separate binary logs. Alibaba built something to do this, and from what I understand Amazon may be doing something along these lines under the hood in RDS MySQL and/or RDS Aurora.