Hacker News new | ask | show | jobs
by mlhpdx 1191 days ago
We can write vast numbers and volume of objects to S3 per second using concurrent processes (spawn 1000 lambda invocations and try it). As long as I have the network bandwidth, I can push stuff essentially as fast as I want. Is that true for EFS? Handle limits. Network interface limits. Protocol limits.

I’m not saying that S3 is perfect or even good for most workloads. However, it is most excellent when the workload fits.

2 comments

Yea it kind of is! I've used EFS in real-world scenarios with more than 1,000 concurrent readers/writers. EFS's costs are just otherworldly compared to S3. If you need that interface though, it's a good (albeit expensive) choice.
At one point we had a ~560tb EFS disk that ran a variety of mixed workloads (large and small files). It was untenable - raw reading/writing IO is OK, but metadata IO hits a brick wall and destroys the performance of the whole disk for all connections (not just ones accessing a particular partition/tree/whatever).

In order to migrate off it and onto s3 I had to build a custom tool in rust that used libnfs directly to list the contents of the disk. We then launched a large number of lambdas to copy individual files to s3.

It was fun, but in my experience EFS is only good if you have a very homogenous workload and are able to carefully optimise metadata IO. I wouldn’t recommend it - s3 is just cheaper, faster and better.

EFS will handle 1000 readers/writers. We tested it as a data exchange medium for computational tasks. The meta-information APIs in EFS in my experience are faster than S3's (LIST in S3 is notorious). The overall amount of data we stored in EFS was pretty limited (single-digit terabytes), though.

I wouldn't use EFS to store petabytes of data, but if you need a resilient and scalable storage that you can easily integrate into your application, then EFS is great.

One thing that I loved, is the ease of use in local development. With EFS you can simply mount the shared volume into your Docker/K8s container in production, and a local directory when you're developing tasks locally on your laptop. You can even run tasks without a container and monitor their output by looking at the exchange directory. There are AWS API emulators (e.g. Localstack) but they are not as convenient.