| HN Mirror

You can get high-enough availability just fine on Postgres. Very few applications require zero downtime. With pgbouncer or similar in front, you can generally flip to a slave with very minimal impact. The issue comes in situations like the one in this case where a mistake leads to being left without up to date slaves and your system can't handle the read load on a single server.

I agree with you in principle, but for most systems it's total overkill. It wouldn't be total overkill if distributed solutions were easy to set up and without tradeoffs, but we're nowhere near being there.

In most cases then, restoration time is the biggest barrier to getting "high-enough" availability without re-engineering everything for a totally different system. Often you can prevent that from becoming an issue by siloing functionality into separate databases, offloading logs and analytics for example. Or buying faster SSDs for your DB servers... There are many approaches depending on the size of your dataset, and most people never outgrow those options.

To put it this way: Gitlab.com's database is small enough that fitting it in RAM on a commodity server is easily doable. While they'd still need to have snapshots on disk, at that point beating the restore speeds they're reporting would be trivial.