Hacker News new | ask | show | jobs
by evanelias 3520 days ago
Facebook has hot replicas in every region. But replicas and backups serve completely different purposes.

Replicas are for failover and read scalability. In terms of failover, when a master dies unexpectedly Facebook's automation fails over to promote a replica to be the new master in under 30 seconds and with no loss of committed data.

Backups are for when something goes horribly wrong -- i.e. due to human error -- and you need to restore the state of something (a row, table, entire db, ...) to a previous point in time. Or perhaps effectively skip one specific transaction, or set of transactions. Replicas don't help with this; as you mentioned, they're kept up-to-date with the master. So a bad statatement run on the master will also affect the replicas.

Occasionally you have some massive failure involving both concepts, like you have 4 replicas and they're all broken or corrupted in some way, then backups are helpful in that case as well.

1 comments

I suppose at Facebook scale it might be infeasible, but couldn't you get the same effect by archiving log segments and a periodic binary full backup? This is precisely what I do with my PostgreSQL databases (though with some friendly automation with pg barman), I assume you could do the same with some tooling around MySQL's binlog facilities.
Yes, although if using the binlogs as-is, that's effectively incremental backup instead of differential. The disadvantages of incremental solutions are that they require more storage and take longer to restore (especially if only doing full backups every few days); the upside is less complexity.