| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cookiecaper 3603 days ago

You're obviously qualified so I'd love to get more info on how you deployed MySQL replication. I have a low opinion of it after using it at my employer for the last few years. MySQL allows writes to slaves (yikes) and the binlog coordinates are a constant hassle. ROW, STATEMENT, and MIXED binlog formats can make interoperability a mess and further increase the risk that there will be discrepancies between slave and master data (for some reason, we're on STATEMENT and/or MIXED replication, which means that the slave receives a list of the queries to execute and may have different output than the master). To sync up a new slave, you have to set up slave_skip_errors to ignore primary key conflicts and other common errors that occur because it's impossible to really get a firm binlog coord (it's really only marginally possible due to Percona's efforts with innobackupex/mydumper). I know that GTIDs are available now and are supposed to resolve the issues with binlog coords, but we haven't activated/used this featureset yet. Is it really a magic bullet?

MySQL just seems to do replication in a very unstable way that no other major RDBMS is willing to approach. I use PostgreSQL replication on a personal project and while it takes a smidgen longer to set up (maybe 20 minutes instead of 5), it seems well worth it to me. I never have to worry about whether my slave's output matches the master (because the query is performed on the master and the result shipped out via the WAL and that's the only way to use replication). I don't have to worry that I connected to the wrong endpoint and accidentally wrote to the slave, causing a conflict that requires a new full dump and resync. I don't have to worry that binlog coords are going to be wrong or that a write will go in at just the wrong moment and potentially require me to redump and resync the whole thing again (but at least require me to skip errors).

How do you handle these issues with MySQL replication, and justify the risks in order to shave a few minutes off the upfront setup time?

1 comments

rwultsch 3603 days ago

Hi cookiecaper, All our operational code (other than pinterest specific bits) is open sourced https://github.com/pinterest/mysql_utils

Both statement and row based replication can be very reliable in modern versions of MySQL. It is my experience the ways data is corrupted are: 1. read_only not being set on slaves, so random users can write to the slave. We set read_only on startup based on service discovery. 2. Bad automation for failovers. See https://github.com/pinterest/mysql_utils/blob/master/mysql_f... for how we do it. 3. Crashes without all the durability settings being on.

If you are having to run slave_skip_errors, you are doing it wrong. You should checkout out our automation for backups and restores. They can be found in mysql_restore.py and mysql_backup.py .

With regard to PG replication, I suggest you watch https://m.youtube.com/watch?v=bNeZYVIfskc&t=26m54s Uber had a 16 hours outage in large part caused by pg replication issues.

Vacuum issues are also no joke.

There are a host of other issues: MySQL can deal with large numbers of connections, PG needs middleware. MySQL is more efficient with "web" workloads where most queries only need to pull on row. etc...

-Rob

cookiecaper 3602 days ago

Those MySQL utilities definitely have a lot of interesting stuff in them and will be good for further research, so thanks. After digging up the xtrabackup command used in there I don't see anything obvious that we were missing (we no longer perform xtrabackup commands since we're all RDS now), but will continue to research. It actually looks like some instances may have been as simple as using the wrong slave coords file; apparently xtrabackup_slave_info contains the master coords, not xtrabackup_binlog_info, which records where the slave is in its own binlog. That's a bit counterintuitive.

We've had other problems with innobackupex, though. We were working with Percona support on a case where a backup lock blocked writes to the DB for 40 minutes and never really resolved it. We had to use minimalist locking parameters in several other cases, which may have also contributed to incorrect binlog coordinates. We experienced a variety of other bugs and issues as well, including corrupted database files, and normally had to invoke our Percona Gold contract to get workarounds or patches.

I'll just note that this class of errors doesn't seem possible in any non-MySQL replication system; there is no slave_skip_errors setting in PgSQL. Your slave either has integrity or it doesn't. That's the way slaves should be. A sane database system won't allow a user to write to a slave or to skip replication rows. I'll also note that the band-aids that make MySQL semi-usable are only there because of Percona's efforts. This stuff doesn't make MySQL seem promising, even if there are workarounds for some of the problems.

Uber's problems as described in that video had nothing to do with PgSQL replication. The carnage was caused by running out of disk space. MySQL doesn't behave well when it gets to 0 free space either; I know from experience. It's as much AWS's fault as PgSQL's, because the reason their disk filled up was a change to IAM requirements. He mentions briefly an attempt to hack Pg replication so it would try to resync from a file with a corrupted header, but probably good for him that that didn't work.

Can't speak to complications associated with vacuum as I've never had to deal with super-large PgSQL databases and pgbouncer is indeed annoying.