Hacker News new | ask | show | jobs
by StreamBright 2341 days ago
>> Or, to put that another way: what are AWS and GCP using in their SANs (EBS; GCE PD) that allows them to take on-demand incremental snapshots of SAN volumes, and then ship those snapshots away from the origin node into safer out-of-cluster replicated storage (e.g. object storage)?

As far as I know AWS does not use SANs because they consider it as anti-pattern. Most backups land on S3 because of reliability and price.

2 comments

SAN and s3 are different beasts.

EBS is very much a SAN, if you read the docs, the Nitro HBA Controllers have dedicated bandwidth allocation for doing just EBS.

As there is a dedicated network for just servicing block storage, that sounds suspiciously like a Storage Area Network to me.

S3 for backup makes lots of sense, its ubiqutous, reliable and smeared over lots of regions. It also works well with large files. Its also orders of magnitude cheaper than EBS to run.

Sure thing. I was referring to the lack of SAN in the context of backups. Yes, EBS is a SAN in that sense.
So how is S3 implemented? Does it reuse any publicly available open source component?
I don’t think they’ve published anything specifically on S3’s architecture (someone please correct me if I’m wrong, I last looked into this a long time ago), but

1. they came out with S3 soon after coming out with their Dynamo paper (before releasing DynamoDB, even); and

2. there’s a good constructive proof, as a studyable FOSS system, for how to build object storage on top of a Dynamo architecture, in the form of Riak CS (object storage) which is built atop Riak KV (a Dynamo impl.) Riak CS seems to make pretty much the same set of guarantees (in terms of time/space complexity of operations, possible durability numbers per scaled number of copies, etc.) that S3 does, so it’s a fair guess that they’re similarly-architected systems.

It is a closed source project that has many components. I am not aware if any of those are opensource.