Hacker News new | ask | show | jobs
by m0xte 2478 days ago
Uploading to S3 is easy until it isn't. It works pretty nicely for one-offs but when you have to blast a few hundred gigs into it or large files from a bit of software you wrote it's a royal pain in the butt. The phrase "multi-part upload" makes me cry inside.
1 comments

Why is that hard? The CLI automatically does multi part uploads. It’s also simple with the various AWS SDKs.
It looks like it works but it's not reliable so your reliability concerns then get externalised into your application which multiplies complexity terribly.
What do you mean by it's not reliable?
I’ve never had the experience myself, but I would assume he means you would have to build in some type of retry logic in your script.

Just from a cursory glance, I couldn’t find any samples of how to do a multipart upload with retries in Python with Boto3.

This is an example of how to do multipart uploads though.

https://medium.com/@niyazi_erd/aws-s3-multipart-upload-with-...

Fun Trivia note: when you do a multipart upload, the S3 hash of the object is not the same as it is when you do a single part upload. I had a file with the same contents but a different hash when I used Python than when it was transferred with the CLI or CloudBerry. The quick and dirty way to fix the hash is to copy the file to itself with Boto3.