Hacker News new | ask | show | jobs
by ryeguy 1584 days ago
Could you use EFS to host a horizontally scalable relational database by using finely sharded SQLite dbs? Like if you had 1 db per user, for example.
6 comments

I do something similar, but test out the performance before you commit to it. There is a massive chasm between how EFS is marketed and how it actually performs. EFS is the slowest possible way to store data in AWS, with painfully low per-client and aggregate throughput. We implemented EFS because it was easy and then immediately commenced the project to replace it with S3. The bare EFS solution didn't survive long enough to make it to the S3 project's finish; we had to add a layer of caching instances in EC2 to bring EFS throughput up to an acceptable level. We almost needed two layers of caching instances; i.e. EFS fans out to N caching instances, which then fan out to M caching instances, which then fan out to P workers, P >> M >> N. Because the EFS throughput is so poor, it could barely cope with the fan-out to the one layer of caching instances. Fortunately the S3 migration project finished before we got that far.

This announcement is, of course, the result of adding a caching layer on top of EFS. Naturally. But because they don't mention "throughput" and only mention "latency" I'm betting they have not used the cache layer to increase throughput.

I'll only ever recommend EFS if your data is very small and the throughput requirements are negligible.

We had to move 300tb of EFS data into S3 in 40 hours, and this really showed how poor EFS performance can be.

The reason was metadata only, which caps out at about ~40mbs. Raw rear speed is great, but any metadata ops cap out quite hard.

We had to hack together a special NFS client to list the contents of the drive using as few metadata operations as possible, then have a separate step to copy the data.

> Could you use EFS to host a horizontally scalable relational database by using finely sharded SQLite dbs? Like if you had 1 db per user, for example.

This is actually a more common use case than I had imagined just one year ago. It would work today, and there are a few optimizations we're doing short term on file locking that will make this much better. If you reach out to AWS support or your TAM we can share more information and time lines under NDA.

I actually recently delivered a project which uses almost this exact technique. Python lambdas that operate on SQLite files have the benefit of being much simpler and cheaper than most other scalable database solutions (like Aurora) for very light loads.

That said, accessing SQLite databases is surprisingly disk IO heavy. I haven't gone too deeply into measuring the effect, but it seems the core issue is that traditional RDBS wire protocols are better than SQLite's disk accesses wrapped over NFS (or whatever connection the lambda/EFS join is). Stuff especially starts to break down when you need any sort of concurrent access. The small overhead for locking/unlocking files can quickly become awful when multiplied by the EFS latency, so you really do need extremely fine sharding.

I've wanted to use "SQLite db per user" for awhile now but haven't had the right problem domain for it yet. https://engineering.backtrace.io/2021-12-02-verneuil-s3-back... looks really interesting.
I guess one could consider using Amazon FSx for Lustre: https://docs.aws.amazon.com/fsx/latest/LustreGuide/what-is.h...
You could also consider using Amazon FSx for NetApp ONTAP.
Yeah, think you could. They have a lambda adapter as well, so you’d get nearly infinite scale straight away.