| It seems to me that, in fact, your original idea was, in fact, the correct one - rsync probably would have been the best way to do this (and separately, a truck full of disks probably would have been the other best way). First, rsync took too long probably because you used just one thread and didn't optimize your command-line options - most of the performance problems with rsync with large filesystem trees comes from using one command to run everything, something like: rsync -av /source/giant/tree /dest/giant/tree And the process of crawling, checksumming, storing is not only generally slow, but incredibly inefficient on today's modern multicore processors. Much better to break it up into many threads, something like: rsync -av /source/giant/tree/subdir1 /dest/giant/tree/subdir1 rsync -av /source/giant/tree/subdir2 /dest/giant/tree/subdir2 rsync -av /source/giant/tree/subdir3 /dest/giant/tree/subdir3 That alone probably would have dramatically sped things up, BUT you do still have your speed of light issues. This is where Amazon import/export comes in - do a one-time tar/rsync of your data to an external 9TB array, ship it to Amazon, have them import it to S3, load it onto your local Amazon machines. You now have two copies of your data - one on s3, and one on your amazon machine. Then you use your optimized rsync to run and bring it up to a relatively consistent state - i.e. it runs for 8 hours to sync up, now you're 8 hours behind. Then you take a brief downtime and run the optimized rsync one more time, and now you have two fully consistent filesystems. No need for drbd and all the rest of this - just rsync and an external array. I've used this method to duplicate terabytes and terabytes of data around, and 10s of millions of small files. It works, and is a lot fewer moving parts than drbd |