Hacker News new | ask | show | jobs
by mappu 3523 days ago
> We do these by taking another full dump and only storing the difference between it and the last full backup that was taken.

Do you diff against the same base, or create an incremental chain? How many diffs do you take in between recapturing a full image? At $DAYJOB we always take full backups into a fast in-house deduplicating store.

> Periodically, each peon syncs with the ORC DB to look for new jobs assigned to it

Is there no better way to handling this than polling?

> LOAD - Load the downloaded backup into the peon’s local MySQL instance. Individual tables are restored in parallel by parsing out statements pertaining to those tables from the backup file, similar to Percona's mydumper.

Presumably you can only get this parallelism by disabling FK integrity. Is it re-enabled in the following VERIFY stage?

1 comments

> Do you diff against the same base, or create an incremental chain? How many diffs do you take in between recapturing a full image? At $DAYJOB we always take full backups into a fast in-house deduplicating store.

We always diff against the same base and have 5 days in between subsequent full dumps. The number of days just comes from a trade off between space occupied by the backups and time it takes to generate them.

> Is there no better way to handling this than polling?

There's definitely different ways to approach this, we find polling works well for us. We also use the same database for crash recovery, so doing the assignments through it serves both purposes.

> Presumably you can only get this parallelism by disabling FK integrity. Is it re-enabled in the following VERIFY stage?

I'm not sure what you mean by parallelism through disabling FK integrity. Splitting the backup into its tables means we can restore a subset of tables instead of the entire backup. This allows us to load individual tables concurrently, but also not have to wait to load a massive database if all we need is a few small tables.

> I'm not sure what you mean by parallelism through disabling FK integrity.

Say you have a `user` table and a `post` table with `post.user_id` being a FOREIGN KEY on `user.user_id`. Without disabling FK integrity you would not be able to restore a post without restoring the user first. When restoring in parallel this might or might not work out.

Facebook (along with almost everyone else using MySQL at massive scale) doesn't use foreign keys.

They scale poorly in MySQL, and they lose a lot of purpose in a massively sharded environment anyway. For example, say you like a status post on Facebook, or friend another user. It's very unlikely that the liked status or friended user exists on the same shard as your account, and there's no way to enforce a foreign key relationship in an inherently non-distributed database like MySQL.

So instead integrity is handled at the application layer, with additional background processes to fix the occasional integrity problem and detect integrity anomalies.

I understood it to mean that if you restore table A and table B in parallel, if there is a foreign key between them, then referential integrity checks would cause one of the loading operations to fail. How do you deal with that?
It's likely that foreign key checks aren't handled at the RDBMS level, but rather at the application level.