Hacker News new | ask | show | jobs
by sintaks 5043 days ago
Amazon has ridiculous internal bandwidth. The costly bit is external. The time delay is largely internal buffer time - they need to pull your data out of Glacier (a somewhat slow process) and move it to staging storage. Their staging servers can handle the load, even at peak. GETs are super easy for them, and given that you'll be pulling down a multi-TB file via the Internet, your request will likely span multiple days anyhow - through multiple peaks/non-peaks.
1 comments

I was referring to the external bandwidth. Even if pulling down a request takes hours, forcing them to start off peak will significantly shift the impact of the incremental demand. I'm guessing that most download requests won't be for your entire archive - someone might have multiple months of rolling backups on Glacier, but it's unlikely they'd ever retrieve more than one set at a time. And in some cases, you might only be retrieving the data for a single use or drive at a time, so it might be 1TB or less. A corporation with fiber could download that in a matter of hours or less.
I get it - but I'm arguing that the amount of egress traffic Glacier customers (in aggregate) are likely to drive is nothing in comparison to what S3 and/or EC2 already does (in aggregate). They'll likely contribute very little to a given region's overall peakiness.

That said - the idea is certainly sound. A friend and I had talked about ways to incentivize S3 customers to do their inbound and outbound data transfers off-peak (thereby flattening it). A very small percentage of the customers drive peak, and usually by doing something they could easily time-shift.