Hacker News new | ask | show | jobs
by _fat_santa 462 days ago
S3 is up there as one of my favorite tech products ever. Over the years I've used it for all sorts of things but most recently I've been using it to roll my own DB backup system.

One of the things that shocks me about the system is the level of object durability. A few years ago I was taking an AWS certification course and learned that their durability number means that one can expect to loose data about once every 10,000 years. Since then anytime I talk about S3's durability I bring up that example and it always seems to convey the point for a layperson.

And it's "simplicity" is truly elegant. When I first started using S3 I thought of it as a dumb storage location but as I learned more I realized that it had some wild features that they all managed to hide properly so when you start it looks like a simple system but you can gradually get deeper and deeper until your S3 bucket is doing some pretty sophisticated stuff.

Last thing I'll say is, you know your API is good when "S3 compatable API" is a selling point of your competitors.

8 comments

> Last thing I'll say is, you know your API is good when "S3 compatable API" is a selling point of your competitors.

Counter-point: You know that you're the dominant player. See: .psd, .pdf, .xslx. Not particularly good file types, yet widely supported by competitor products.

Photoshop, PDF and Excel are all products that were truly much better than their competitors at the time of their introduction.

Every file format accumulates cruft over thirty years, especially when you have hundreds of millions of users and you have to expand the product for use cases the original developers never imagined. But that doesn’t mean the success wasn’t justified.

PDF is not a product. I get what you are day but I can’t say that I’ve ever liked Adobe Acrobat
PDF is a product, just like PostScript was.
Most people use libraries to read and write the files, and judge them pretty much entirely by popularity.

A very popular file format pretty much defines the semantics and feature set for that category in everyone's mind, and if you build around those features, then you can probably expect good compatibility.

Nobody thinks about the actual on disk data layout, they think about standardization and semantics.

I rather like PDF, although it doesn't seem to be well suited for 500MB scans of old books and the like, they really seem to bog down on older mobile devices.

It’s designed for that level of durability, but it’s only as good as a single change or correlated set of hardware failures that can quickly change the theoretical durability model. Or even corrupting data is possible too.
You're totally correct, but these products also need to be specifically designed against these failure cases (i.e. it's more than just MTTR + MTTF == durability). You (of course) can't just run deployments without validating that the durability property is satisfied throughout the change.
Yep! There’s a lot of checksum verification, carefully orchestrated deployments, hardware diversity, erasure code selection, the list goes on and on. I help run a multi-exabyte storage system - I’ve seen a few things.
This is true. While I prefer non-SaaS solutions generally, S3 is something that’s hard to cost effectively replace. I can setup an AWS account, create an S3 bucket, and have a system that can then persist at least one copy of my data to at least two data centers each within a goal of 1 second. And then layer cross-region replication if I need.

It’s by no means impossible to do that yourself, but it costs a lot more in time and upfront expense.

I've used it for server backups too, just a simple webserver. Built a script that takes the webserver files, config files and makes a database dump, packages it all into a .tar.gz file on monday mornings, and uploads it to S3 using a "write only into this bucket" access key. In S3 I had it set up so it sends me an email whenever a new file was added, and that anything older than 3 weeks is put into cold storage.

Of course, I lost that script when the server crashed, the one thing I didn't back up properly.

If you haven’t already, make sure that versioning is enabled on that bucket!
> but as I learned more I realized that it had some wild features that they all managed to hide properly so when you start it looks like a simple system but you can gradually get deeper and deeper until your S3 bucket is doing some pretty sophisticated stuff.

Over the years working on countless projects I’ve come to realize that the more “easy” something looks to an end user, the more work it took to make it that way. It takes a lot of work to create and polish something to the state where you’d call it graceful, elegant and beautiful.

There are exceptions for sure, but often times hidden under every delightful interface is an iceberg of complexity. When something “just works” you know there was a hell of a lot of effort that went into making it so.

I did a GCP training a while back, and the anecdote from one of the trainers was that the Cloud Storage team (GCP’s S3-compatible product) hadn’t lost a single byte of data since GCS had existed as a product. Crazy at that scale.
Well, google cloud has destroyed entire accounts, but I suppose that's not a storage failure per se.
Here's the link to that https://news.ycombinator.com/item?id=40304666

Moral of the story: the "technical part" of things is not the end of the story

Alternative moral: The 3-2-1 backup rule of thumb is still alive and well (is your cloud account a single point of failure?)

Related: https://www.infoworld.com/article/2179073/murder-in-the-amaz...

Eh, they have lost a bit
> their durability number means that one can expect to loose data about once every 10,000 years

What does that mean? If I have 1 million objects, I loose 100 per year?

What it means is in any given year, you have a 1 in 10,000 chance that a data loss event occurs. It doesn’t stack like that.

If you had light bulbs that lasted 1,000 hrs on average, and you had 10k light bulbs, and turned them all on at once, then they would all last 1,000 hours on average. Some would die earlier and some later, but the top line number does not tell you anything about the distribution, only the average (mean). That’s what MTTF is; the mean time for a given part to where it has a greater likelihood to have failed by then vs not. It doesn’t tell you if the distribution of light bulbs burning out is 10 hrs or 500 hrs wide. it’s the latter, you’ll start seeing bulbs out within 750 hrs, but if the former it’d be 995 hrs before anything burned out.

Isn't it just a marketing number? I didn't think durability was part of the S3 SLA, for example.
Object integrity isn’t part of the S3 SLA. I assume that is mostly because object integrity is something AWS can’t know about per se.

You could unknowingly upload a corrupted file, for example. By the time you discover that, there may not be a clear record of operations on that object. (Yes, you can record S3 data plane events but that’s not the point.)

Only the customer would know if their data is intact, and only the customer can ensure that.

The best S3 (or any storage system) can do is say “this is exactly what was uploaded”.

And you can overwrite files in S3 with the appropriate privileges. S3 will do what you ask if you have the proper credentials.

Otherwise, S3 is designed to be self-healing with erasure encoding and storing copies in at least two data centers per region.

S3 supports checksumming, you just need to provide a hash in a header when you upload an object.
Yes but my point stands. If AWS added S3 data integrity to the SLA then it’s now made that commitment contractually. If you add checksum data the checksums would (logically) be required and also be in scope of the SLA. If there was a mismatch between them and the file functioned it would be impossible to sanely adjudicate who is responsible for the discrepancy, or what the nature of that discrepancy might be if no other copies of the file exist.

AWS probably doesn’t want those risks and ambiguities.

Amazon claims 99.999999999% durability.

If you have ten million objects, you should lose one every 10k years or so.

How does that compare to competitors and things like distributed file systems?
I generally see object storage systems advertise 11 9s of availability. You would usually see a commercial distributed file system (obviously stuff like Ceph and Lustre will depend on your specific configuration) advertise less (to trade off performance for durability).
In general if you actually do the erasure coding math, almost all distributed storage systems that use erasure coding will have waaaaay more than 11 9s of theoretical durability

S3's original implementation might have only had 11 9s, and it just doesn't make sense to keep updating this number, beyond a certain point it's just meaningless

Like "we have 20 nines" "oh yeah, well we have 30 nines!"

To give an example of why this is the case, if you go from a 10:20 sharding scheme to a 20:40 sharding scheme, your storage overhead is roughly the same (2x), but you have doubled the number of nines

So it's quite easy to get a ton of theoretical 9s with erasure coding

it's really not that impressive, but you have to use erasure coding (chop the data D in X parts, use these to generate Y extra pieces, and store all X+Y of them) iso replication (store D n times)
I’ve never worked with AWS, but have a certification from GCP and currently use Azure.

What do you see as special for S3? Isn’t it just another bucket?

The durability is not so good when you have a lot of objects
Why not? I don't work with web-apps or otherwise use object stores very often, but naively I would expect that "my objects not disappearing" would be a good thing.
I think their point is that you'd need even higher durability. With millions of objects, even 5+ nines means that you lose objects relatively constantly.