| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by huntaub 580 days ago
	Hey, thanks for reaching out. The caching layer does return success before writing to S3 -- that's how we get good performance for all operations, including those which aren't possible to do in S3 efficiently (such as random writes, renames, or file appends). Because the caching layer is durable, we can safely asynchronously apply these changes to the S3 bucket. Most operations appear in the S3 bucket within a minute!

2 comments

mbrt 580 days ago

Very nice, I like the approach. I assume data is partitioned and each file is handled by an elected leader? If data is replicated, you still need a consensus algorithm on updates.

How are concurrent updates to the same file handled? Either only one client can open in write at any one time, or you need fencing tokens.

link

huntaub 580 days ago

Without getting too much into internals which could change at any time, yes. You have to replicate, partition, and serve consensus over data to achieve high-durability and availability.

For concurrent updates, the standard practice for remote file systems is to use file locking to coordinate concurrent writes. Otherwise, NFS doesn't have any guarantees about WRITE operation ordering. If you're talking about concurrent writes which occur from NFS and S3 simultaneously, this leads to undefined behavior. We think that this is okay if we do a good job at detecting and alerting the user if this occurs because we don't think that there are applications currently written to do this kind of simultaneous data editing (because Regatta didn't exist yet).

link

mbrt 580 days ago

Thanks for the details!

Consistency at the individual file can be guaranteed this way, but I don't think this works across multiple files (as you need a global total order of operations). In any case, this is a pragmatic solution, and I like the tradeoffs. Comparing against NFS rather than Spanner seems the right way to look at it.

link

huntaub 580 days ago

This is actually also interesting, in that I don’t think that the file system paradigm actually requires a global total ordering of operations (and, in fact, many file systems don’t provide this). I know that sounds like snapshots wouldn’t be valid, but I think that applications which really care about data consistency (such as databases) are built specifically to handle this (with things like write-ahead-logs).

link

ignoramous 580 days ago

Regatta is a write-through cache for s3 bucket under its supervision? I guess then external changes to that bucket is a no-no?

Any plans to expand to other stores, like R2 (I ask since unlike S3, R2 egress is free)?

link

huntaub 579 days ago

Hey there, that's sort of the correct way to think about it -- notably that our caching layer is high-durability, so we can keep recent writes in the cache safely. External changes to the bucket are okay! Lots of customers need to (for example) ingest data into S3, then process it on a file system, and that totally works. The only thing that isn't supported is editing the same file from both S3 and the file system simultaneously. We think this is a super rare case, and probably doesn't exist today (because there isn't anything that bridges S3 and file semantics yet).

We support all S3-compatible storage services today, including R2, GCS, and MinIO.

link

ignoramous 579 days ago

I actually asked about R2 to see if Regatta's pricing is any different as there's no egress fee. I should have been clearer.

btw, thanks a bunch for answering my Q & everyone else's too (except for parts where you couldn't talk about the implementation, understandably so). Appreciate it. Wishing the best.

link