| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by raffraffraff 896 days ago

It amazes me how much better MySQL has been in this regard for at least a decade, and it's also amazing that it's still not that well-known today. Back in 2015 I worked at a fast-growing unicorn that had badly implemented basically everything because they started with a tiny ops team of grads and developers. Very little was being monitored, there were only a handful of metrics being graphed (mostly network stuff in Cacti). Our DB issues were all caused by stupid stuff :

* undetected hard disk in array fails

* battery in array controller fails

* disk fills up

* dubious backups, with no point-in-time recovery

* extremely poorly written SQL queries

* poory configured MySQL (in oh-so-many ways)

The top three (at least) would lastly cause replication lag, which would eventually trigger an alert. ... And yet we never lost a cluster. (And we far a lot of them!)

My team sweated blood improving processes and tooling, and then I spent a 6 month stint on database clusters (switching to GTID based replication and rewriting the ops config code so that they were all consistently configured and monitored).

Occasionally we'd get a new senior hire insist that PostgreSQL was a necessity, so we'd stand back and let them produce a proof of concept that stood up to the types of failures our MySQL clusters dealt with regularly, without waking oncall up at night. And it was always a bit of a joke by comparison.