Hacker News new | ask | show | jobs
by blyat 2153 days ago
I've only done a few 30+TB moves, largest being 50TB which had similar table structure - one table per machine holding calibration readings that were constantly streamed in from a MUCH beefier RabbitMQ instance. Old CPU, not a ton of cores, spinning disks, etc. Same base problem. I think the best thing to do here would be change the order of some things around.

1. Buy the hardware first with blazing drives and roaring CPUs and get it stood up before you start trying to change your schema on a potato

2. Do his last step now: set up an AG and let SQL Server do it's thing

3. Now go start planning your ETL/schema changes and writing scripts while you wait 1-3 days (or in my case 10 as we had to suspend during business hours) for data to migrate.

4. Fail-over and promote and start collecting data on your superbox

5. If the world didn't explode, you do the same thing at your other DC

6. Now, if you absolutely must, apply and test all your schema changes in production instead of dev first but enjoy the fact that those changes are exponentially faster!

You could also use log shipping, but with those restore times you'd have data loss I think. Haven't done that in years - I've been lucky enough to only have to deal with projects where we shutdown, restore, and do a full downtime migration. So much nicer...

1 comments

Having step 1 of your solution be "eliminate the hardware constraints that made this job painful" is kind of a cheat. We can all do a better job on any problem if we lift those constraints.
Hardware is way cheaper than 11 months salary + lost opportunity cost.
I think we all agree on that, but the question was not whether this was an efficiently managed migration. I think we can assume OP made a similar argument up the chain (he got them to add SSD storage, at least).