| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cuno 976 days ago

Hi, author here.

Yes agree that you can't "just" put a POSIX API on S3, but that doesn't make it impossible. For the sake of keeping the article to a reasonable size, I left a lot of things out. There are tradeoffs that occur between POSIX semantics, consistency and performance. Each application/process has different needs but the great thing about running right inside the process is that we can see what those needs are and adapt. For example, many applications have no need for random access writes - the only libc calls and syscalls exposed are purely sequential. Some processes have both random access writes and POSIX record locks around them to protect them from other concurrent processes - and we can see that. That means we treat these applications/files differently, with some corresponding performance implications. This is very different to a normal filesystem that has to treat every process the same way because it is a "black box".

You're right that AFS, and for that matter NFS, can in principle return error on close which many existing applications unfortunately aren't written to handle. However, that doesn't mean that NFS isn't practical - it is very widely used.

Our customers mostly run workloads in the same region as the object storage (whether in cloud or on-prem) typically with very high availability. As an essentially networked file system, you're right that it can't make much stronger guarantees than the NFS protocol itself does, but operating inside cloud infrastructure you typically see 4 9s availability.

1 comments

wrs 975 days ago

I’m sure you’re aware of the issues, and this looks like highly useful work! But I’ve seen enough human nature to know that a lot of people will see “POSIX API” and assume they can just run anything on it without further thought. I know this because for years I’ve seen people run things on AFS and NFS, see weird concurrency behavior or data loss or latency, and blame the filesystem for not performing miracles, rather than blaming the application for not taking nonlocal storage into account.

The strongest argument for using a different API for object storage is that you don’t get that excuse. The API presents the true semantics and failure conditions and the application needs to think them through. (And your position that it may not be a strong enough argument is perfectly valid.)

link