|
|
|
|
|
by CherryJimbo
2630 days ago
|
|
We looked briefly at the snowball and fireball, but wanted to do this as quickly as possible, whilst keeping the process entirely transparent to our users. It was also an excuse for our team to get intimately familiar with the B2 API, since it's not compatible with S3. If we were to consider another large migration like this, physical media would probably be the way to go. |
|
So, snowball works in a lot of area, but like so many AWS products, it works if you adapt to it.
pigz/scp/zstd works extremely fast in line.
In your case you're pulling from S3 to another object store.
I moved ~1PB from one S3 region to another. "Why not use replication," they asked. That only works if it's turned on when you upload the object - another fine-print 'gotcha' in the easy AWS service. Then you get into rate-limits. In 2010 I asked AWS if I could spin up 1000 servers to test something - nope - elastiticy at that level is for the big boys.
Now I work for a large cloud company and we still run into elasticity.
To move the 1PB from one S3 region to another we spun up hundreds of spot instances (oh, we were compressing and glacierizing it too) and built a perl/mysql batch job "s3 get | zstd | s3 put" process and parallelized it. One thing nice about S3 is it pulls the md5 hash - unless multipart, in which case it's the hash of the hash, oh yeah.... So you should split it in advance if you want to verify the hash (more fine print).
Worked great. Good for you for sharing this project, very cool.