Hacker News new | ask | show | jobs
by ragulpr 583 days ago
Love this idea! Biggest hurdle though have been to have predictable Auth&IO across multiple Python/Scala versions and all other things (Spark, orchestrators, CLI's of teams of varying types of OS etc etc) add to that access logs.

SF3s/boto/botocore versions x Scala/Spark x parquet x iceberg x k8s etc readers own assumptions makes reading from S3 alone a maintenance and compatibility nightmare.

Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?

If so then you will win 100% I wouldn't even care about speed/cost if it's up to par with s3

1 comments

Yeah, that's exactly right. I had some... experiences with Spark recently, that convinced me that this is something that could really help. I also really like the idea that organizations can continue to use S3 as the source of truth for their data (as you mention, it means that you can continue to use Access Logs, which would capture all usage of your S3 bucket across your applications).

> Will the mounted system _really_ be accessible as local fs and seen as such to all running processes? No surprises? No need for python specific filesystem like S3Fs?

Ha, well it depends on what you mean by surprises. We won't have a Python-specific file system. Our client is going to come in two flavors. Today, you can mount Regatta over NFSv3 (which we wrap in TLS to make it secure). This works for some workloads, but doesn't provide like-for-like performance with EBS. Over the next month, we plan to release the "custom protocol" that I wrote about above, that we expect to send to customers in the form of a FUSE file system.

Either way, it should be one package, you shouldn't need to worry about versioning, and it will appear as a real, local file system. :D