Hacker News new | ask | show | jobs
by dencold 4679 days ago
Long-time user of boto[1] here. It has been the go to library to hook your python code into AWS and has a fairly active following on github[2].

One API point that I've found lacking in boto is a "sync" command for S3. Take a source directory and a target bucket and push up the differences ala rsync, that's the dream. Boto gives you a the ability to push/get S3 resources, but I've had to write my own sync logic.

So, the first thing I went digging into is the S3 interface of the new CLI, and to my surprise, they've put a direct sync command on the interface[3], huzzah! Their implementation is a little wacky though. Instead of using some computed hashes, they are relying on a combination of file modtimes and filesize. Weird.

Anyways, glad to see AWS is investing in a consistent interface to make managing their services easier.

[1] http://boto.readthedocs.org/en/latest/index.html

[2] https://github.com/boto/boto

[3] http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html...

4 comments

That is good news. I too wrote a sync layer to sit above boto for a previous project. My use-case is a little different in that I sync from S3 to RackspaceCloud as a backup. I just use file name (object name) as the key because I know that files never change (though are added and removed). I create a complete object listing of S3 and a complete object listing of CF, diff and then sync.

One disappointing issue is that the listing process on CF is a magnitude faster than S3.

    CF: real	2m7.628s
    S3: real	14m15.680s
Keep in mind that this is all being run from an EC2 box, so really, S3 should win hands down.
Good to hear this feedback. I work for AWS; I will pass this to the team. Feel free to shoot me an email: simone attt amazon do0tcom if you have more comments.

Thanks!

The rsync command uses a combination of file modtimes and file sizes as it's default algorithm. It's very fast and efficient. I agree, though, that like rsync, it would be good to add a --checksum option to the s3 sync command in AWS CLI. Feel free to create an issue on our github site https://github.com/aws/aws-cli so we can track that.
s3cmd has a sync option that will do this.

If you're looking for something a little more robust, I just released this a couple of days ago-

https://github.com/HeyImAlex/s3tup

It's still in a really early stage but I've been using it to sync and configure my personal site that's hosted on s3 and it's worked well so far.