|
|
|
|
|
by ak217
1902 days ago
|
|
The AWS Go SDK now has a connection pool based S3 download/upload manager API that allows saturating your (e.g. 40Gbit/s EC2-S3) network connection using far less memory and CPU than is possible with Python. A colleague of mine developed this tool to make this functionality available in a CLI: https://github.com/chanzuckerberg/s3parcp |
|
Just last week I wrote basically the same thing as an ad-hoc solution using boto3 because I had 10s of TB of data to pull out of Glacier and distribute across S3 buckets. It wasn't a big deal because I'm experienced writing parallel network code in Python and having big datastreams flow, and boto3 has good documentation, but things like this really shouldn't be left as an exercise to the SDK consumer.