|
|
|
|
|
by diggs
3584 days ago
|
|
This approach works well enough for relatively small amounts of objects. Once you start getting in to the millions (and significantly higher) then it begins to break down. Every "sync" operation has to start from scratch, comparing source and target (possibly through an index) on a file by file basis. There are definitely faster ways of doing it that scale to much larger object counts, but then they have their own drawbacks. It's a shame the S3 Api doesn't let you order by modified date, or this would be trivial to do efficiently. |
|
the main innovations in s3s3mirror are (1) understanding this & going for massive parallelism to speed things up and (2) where possible, comparing etag/metadata instead of all bytes.
so far, it has scaled pretty well, i know of no faster tool to synchronize buckets with millions of objects.