Hacker News new | ask | show | jobs
by count 578 days ago
I don't see any other question about it, so maybe I just missed the obvious answer, but how do you handle POSIX ACLs? If the data is stored as an object in S3, but exposed via filesystem, where are you keeping (if at all?) the filesystem ACLs and metadata?

Also, NFSv3 and not 4?

1 comments

Great call out. Some kinds of data, like ACLs and specific kinds of metadata, don't live in S3. Full disclosure, we don't support ACLs today (but plan to soon). We keep file system metadata in the durable cache. For some files (where users haven't changed permissions, etc), we are able to release that cached metadata when the file is no longer in use. For other files (where permissions have been changed by the user), that metadata must live in the cache long-term.

We selected NFSv3 due to it's broad compatibility with different compute environments. For example, Windows has an NFSv3 client in it, but doesn't have an NFSv4 client. There are lots of enterprise workloads which needs simultaneous access to file data from both Windows and Linux, and supporting NFSv3 was the easiest path to support those workloads.

Do you pay for metadata accesses? Does running a `find` across the filesystem cost anything? What about system calls that don't transfer data? Can I move or rename a file without paying to copy and then delete the associated S3 object?
Today, we only charge for cache usage (storage) and data transfer between Regatta and S3. If your metadata access doesn't require transfer to S3, then it doesn't cost anything! However, renames do require transfer to S3 (because we have to move the object on the backend).
does that mean you pay for the storage twice (i.e. S3 and Regatta) or is the cache size tunable?
That’s correct — you pay for the storage yourself in S3, and then you pay for the storage when it’s in the Regatta cache. We may expose the ability to limit the cache size in the future for teams who need controllable costs more than the highest performance.
Thanks, I keep hoping someone comes up with some magic :)

Is the intent to run this in-vpc?

And how do you differentiate from AWS Storage Gateway?

I'd love to hear more about what you're excited to do when the magic arrives. :D

We are running it as a managed SaaS, so our customers connect to the caching layer that runs in the Regatta VPC. This allows us to manage the infrastructure for them and keep costs low.

Storage Gateway is an interesting product, and I worked closely with that team for several years -- so mad respect for them. It was designed to be an appliance that you run on servers in your own data center (of course, many customers now deploy it to EC2). Because of this, it's designed to operate in an environment with "finite storage" -- for example, different workload pattterns can thrash the cache, which results in poor performance to clients, and it's not designed to run in a high-availability cluster in the cloud. Regatta solves these problems with durable cache storage that's safe to data in long-term, and is designed for high-availability.