Hacker News new | ask | show | jobs
by cyberax 31 days ago
> My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.

As far as I remember, accounts within the same organization will have the same mapping. You also can use stable zone names these days, instead of the regular mappings.

And yeah, egress traffic pricing is freaking insane at this point. It's the biggest reason to NOT use AWS.

1 comments

Insanely high S3 storage charges too. $23/TB/month? Even with the insane HDD pricing that we see today, that's paying off a drive in 1 month (at retail) that will last for 50-100 months. Sure, there's probably some encoding overhead, but it's still mad.
S3 is pretty competitive if you want similarly-performing storage with consistent millisecond-level latency, high scalability, and at least 3x redundancy. Try looking at how much it's going to cost you in enterprise SSDs :)
Is it 3x redundancy forever? I always just kinda assumed it was RS encoded after a while, so only 30-50% larger than a single copy. Plus, almost all object storage is written to / read from hard disks, not to SSDs. Unless they're in a caching layer that is.

I know Azure has done a bunch of work around Pyramid Codes (essentially a locally repairable EC/RS variant), and Google obviously have the Colossus infrastructure that allows variable encodings, I'd be surprised if AWS is still triple-replicated everywhere.

Yes, and S3 is multiply redundant and is designed to survive a total AZ failure. So your data is physically replicated into at least 2 different AZs and might be multiply-redundant within them. They also provide a crazy SLA for data integrity, meaning that data must never be lost.

S3 also has a reduced redundancy tier and infrequent access tiers that are quite a bit cheaper.

It _is_ expensive, but once you crunch all the numbers, it's actually not unreasonable. I'd argue that using the real S3 is overkill for most scenarios that don't need infinite scalability.

GCS / Azure can survive a total AZ failure too, think of 3 x Replicas as RAID-1, whereas Erasure Coding is more like RAID-6. Only it's actually more resilient:

Let's say you have 10 data blocks, and you have 4 parity blocks. You can now lose 4 servers containing a block and still be able to repair the data, whereas in 3 x Replica you can only lose 2, and have to store everything 300% of size, instead of only 140%.

And yes, it is unreasonable how much they charge for both storage and inter-az bandwidth.

The problem with erasure coding is that if a disk fails, you suddenly need to read 3-5x more data to reconstruct the missing data from parity blocks. This is especially problematic if your replicas are split across zones. The inter-zonal bandwidth is large, but not infinite.

So I'm pretty sure that you need to have at least 2 full copies in different AZs, and then likely at least some additional redundancy within a single AZ (in the form of erasure codes or a full mirror).

So that's at least 3-4x the amount of data. 1Tb of NVMe SSD capacity is around $200 and with 3x redundancy that's $600, or about 2 years of AWS S3 storage. As I said, it's expensive but not unreasonable.