Hacker News new | ask | show | jobs
by FullyFunctional 1191 days ago
Forgive the question but I never quite understood the point of S3. It seems it’s a terrible protocol but it’s designed for bandwidth. Why couldn’t they have used something like, say, 9P or Ceph? Surely I’m missing something fundamental.

EDIT: In my personal experience with S3 it’s always been super slow.

13 comments

Because you don't have to allocate any fixed amount up front, and it's pay as you go. At the time when the best storage options you could get were fixed-size hard drives from VPS providers, this was a big change, especially on both the "very small" and "very large" ends of the spectrum. It has always spoken HTTP with a relatively straightforward request-signing scheme for security, so integration at the basic levels is very easy -- you can have signed GET requests, written by hand, working in 20 minutes. The parallel throughput (on AWS, at least) is more than good enough for the vast, vast majority of apps assuming they actually design with it in mind a little. Latency could improve (especially externally) but realistically you can just put an HTTP caching layer of some sort in front to mitigate that and that's exactly what everybody does.

Ceph was also released many years after S3 was released. And I've never seen a highly performant 9P implementation come anywhere close to even third party S3 implementations. There was nothing for Amazon to copy. That's why everyone else copied Amazon, instead.

It's not the most insanely hyper-optimized thing from the user POV (HTTP, etc) and in the past some semantics were pretty underspecified e.g. before full consistency guarantees several years ago, you only got "read your writes" and that's it. But it's not that hard to see why it's popular, IMO, given the historical context and use cases. It's hard to beat in the average case for both ease of use and commitment.

Thanks, it see now. Essentially I lacked the original context. I got many excellent answers and can’t reply everyone.
When S3 was released the Internet was very different. Two of the things that stood out were:

1. It offered a resilient key/object store over HTTP.

2. By the standards of the day for bandwidth and storage it was (and to a certain extent still is) very inexpensive.

Since then much of AWS has been built on the foundation of S3 and so its importance has changed from merely being a tool to basically a pervasive dependency of the AWS stack. Also, it very much is designed for objects larger than 1KB and for applications that need durable storage of many, many large objects.

The key benefit, at least according to AWS marketing, is that you don't have to host it yourself.

Simple api

Absurdly cheap storage

Extremely HA

Absurdly durable

Effectively unlimited bandwidth

Effectively unbounded storage without reservation or other management

Everything supports its api

It’s not a file system. It’s a blob store. It’s useful for spraying vast amounts of data into it and getting vast amounts of data out of it at any scale. It’s not low latency, it’s not a block store, but it is really cheap and the scaling of bandwidth and storage and concurrency make it possible to build stuff like snowflake that couldn’t be built on Ceph in any reasonable way.

The problem is S3 is just a lexicographically ordered key value store with (what I suspect is) key-range partitions[1] for the key part and Reed-Solomon encoded blobs for the value part. In other words, it’s a glorified NoSQL database with no semantics that you’d typically expect of a file system, and therefore repeated writes are slow because any modification to an object involves writing a new version of the key along with its new object.

[1] https://martinfowler.com/articles/patterns-of-distributed-sy...

These aren't really problems tho, just features.

These features may or may not be a problem for your application depending on your specific requirements.

It's clear that for many many applications S3 works just fine.

If you require file system semantics or interfaces (i.e. POSIX) or you update objects a lot or require non-sequential updates or.... then maybe it's not for you.

S3 is straight HTTP, the most widespread API. It can be directly used on the browser, has libraries in pretty much every language, and can reuse the mountain of available software and frameworks for load-balancing, redirections, auth, distributed storage etc
I think theres an interesting story in software ecosystems where there are two flavors of applications (which coexist) that prefer object stores over filesystems and vice versa. Good reference point for this I think exists in many modern video transcoding infrastructures.

Using something like FSx [1] gives you a performant option for the use cases when the tooling involved prefers filesystem semantics.

[1] https://aws.amazon.com/fsx/lustre/

Here are reasons I'm using S3 in some projects:

1. Cost. It might vary depending on vendor, but generally S3 is much cheaper than block storage, at the same time with some welcome guarantees (like 3 copies).

2. Pay for what you use.

3. Very easy to hand off URL to client rather than creating some kind of file server. Also works with uploads AFAIR.

4. Offloads traffic. Big files often are the main source of traffic on many websites. Using S3 allows to remove that burden. And S3 usually served by multiple servers which further increases speed.

5. Provider-independent. I think that every mature cloud offers S3 API.

I think that there are more reasons. Encryption, multi-region and so on. I didn't use those features. Of course you can implement everything with your own software, but reusing good implementation is a good idea for most projects. You don't rewrite postgres, so you don't rewrite S3.

Thanks, I was unclear and meant only the protocol S3 not the service, but I see now that as a KV store it makes sense.
> In my personal experience with S3 it’s always been super slow.

Numbers? I feel like it's been a while, but my experience was it is in the 50ms latency range. That's fast enough that you can do most things. Your page loads might not be instant, but 50ms is fast enough for a wide range of applications.

The big mistake I see though is a lack of connection pooling: I find code going through the entire TCP connection setup, TLS setup, just for a single request, tearing it all down, and repeating. boto also enouranges some code patterns which result in GET bucket or HEAD object requests which you don't need and can avoid; none of this gives you good latency.

S3 works over HTTP, which means that it is designed to work over the internet.

Other protocols you mentioned, including NFS, does not work well over the internet.

Some of them are exclusively designed to work within the same network, or very sensitive to network latency.

> Forgive the question but I never quite understood the point of S3.

S3 and DynamoDB are essentially a decoupled BigTable; in that both are KV databases: One is used for high performance, small obj workloads; the other for high throughput, large obj workloads.

They have NFS (called EFS), but it's about 10x more expensive.
I wouldn’t give a number because the pricing models are fairly different and the real cost will depend on how you’re using it and how easy it is to shift your access patterns. On my apps using EFA, that 10x is more like .8-1.1x — an easy call versus rewriting a bunch of code.
Good luck mounting EFS in Windows.
Apparently the nfs client in windows only supports nfs v3 - while efs only supports v4. The closest I found was:

http://citi.umich.edu/projects/nfsv4/windows/readme.html

Seems odd that there are no commercial nfs v4 clients for windows? Might now be possible to mount via wsl?

I see ractos has nfs client - but could not figure out which version...

WinFsp (FUSE for Windows) has an NFS driver: https://github.com/winfsp/nfs-win
Do you mean EFS specifically, or you find that NFS doesn't work? Because it was my recollection that Windows included NFS machinery natively
EFS - this is what is being talked about here.
AWS also offers FSx for Windows File Server, and FSx for ONTAP if you need remote Windows file service.
We are talking about EFS here
S3 is slow but at the same time low cost, if you want fast AWS has other alternatives but pricier.
This is misleading. S3 is also incredibly fast. The former when you’re sequentially writing (or reading) objects and the latter when concurrently writing (or reading) vast numbers of them.
That depends on what you consider "fast". EFS (the "serverless" NFS) has sub-millisecond operation latency. S3 is more in the 10-20ms range for most operations, with occasional spikes.

BTW, if you need a pure Go client for NFSv4 (including AWS EFS), feel free to check my: https://github.com/Cyberax/go-nfs-client

We can write vast numbers and volume of objects to S3 per second using concurrent processes (spawn 1000 lambda invocations and try it). As long as I have the network bandwidth, I can push stuff essentially as fast as I want. Is that true for EFS? Handle limits. Network interface limits. Protocol limits.

I’m not saying that S3 is perfect or even good for most workloads. However, it is most excellent when the workload fits.

Yea it kind of is! I've used EFS in real-world scenarios with more than 1,000 concurrent readers/writers. EFS's costs are just otherworldly compared to S3. If you need that interface though, it's a good (albeit expensive) choice.
At one point we had a ~560tb EFS disk that ran a variety of mixed workloads (large and small files). It was untenable - raw reading/writing IO is OK, but metadata IO hits a brick wall and destroys the performance of the whole disk for all connections (not just ones accessing a particular partition/tree/whatever).

In order to migrate off it and onto s3 I had to build a custom tool in rust that used libnfs directly to list the contents of the disk. We then launched a large number of lambdas to copy individual files to s3.

It was fun, but in my experience EFS is only good if you have a very homogenous workload and are able to carefully optimise metadata IO. I wouldn’t recommend it - s3 is just cheaper, faster and better.

EFS will handle 1000 readers/writers. We tested it as a data exchange medium for computational tasks. The meta-information APIs in EFS in my experience are faster than S3's (LIST in S3 is notorious). The overall amount of data we stored in EFS was pretty limited (single-digit terabytes), though.

I wouldn't use EFS to store petabytes of data, but if you need a resilient and scalable storage that you can easily integrate into your application, then EFS is great.

One thing that I loved, is the ease of use in local development. With EFS you can simply mount the shared volume into your Docker/K8s container in production, and a local directory when you're developing tasks locally on your laptop. You can even run tasks without a container and monitor their output by looking at the exchange directory. There are AWS API emulators (e.g. Localstack) but they are not as convenient.

fast is an overloaded word. Could mean throughput, or latency. S3 throughput is incredible.

note, I worked on Amazon at S3 2015-2017.

I could have worded it better. I was just trying to understand the why of the protocol. My experience was probably irrelevant as we were just using it for storage and interfacing with an FS translation (rclone or similar). We have long stopped using AWS for cost reasons though.
Same with my experience. Not a fan