Hacker News new | ask | show | jobs
by johnkoepi 3495 days ago
> 3) Transferring data from one host to another when you need to bring up a stateful process does not scale. Above a few gigs, the transfer process creates significant load on both the source and destination host, and will easily saturate the link between the two...

It depends. @falcolas, could you point out a few other solutions then that? (for resharding live, other then replication / copy / rsync / ...)

1 comments

Copying data is always going to be expensive, but it can't be avoided.

The lightest weight solution I've seen is restoring from a daily backup in something like S3, then setting up as a slave from a live master to catch up on the day's binlogs. Still a lot of data to move and load, but at least it's not the entire contents of the DB.

The best you can do is be in control of when data transfers happens so you're doing it when it makes sense and not in the middle of your highest traffic period (which is what frequently happens when attempting to automatically scale DBs in response to load).

That sounds a bit like what Joyent is doing in their Autopilot Pattern implementation for MySQL: https://www.joyent.com/blog/dbaas-simplicity-no-lock-in
I'd add that one can use throttling to bring new databases in live, while sustaining peak traffic.

Some databases have a configurable limit in MB/s for replication, or it's possible to assign disk/CPU quota on the slave to slow it down.

Combine that with good planning and monitoring, you'll be fine =)