| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dabinat 115 days ago
	The problem with using S3 as a filesystem is that it’s immutable, and that hasn’t changed with S3 Files. So if I have a large file and change 1 byte of it, or even just rename it, it needs to upload the entire file all over again. This seems most useful for read-heavy workflows of files that are small enough to fit in the cache.

3 comments

wolttam 115 days ago

That’s not that different than CoW filesystems - there is no rule that files must map 1:1 to objects; you can (transparently) divide a file into smaller chunks to enable more fine grained edits.

link

vbezhenar 114 days ago

The most obvious approach seems to implement device blocks as S3 objects and use any existing file system on top of it.

link

yencabulator 113 days ago

S3 is notoriously miserable with small objects.

link

otterley 114 days ago

The unit of granularity for a CoW filesystem is a block, which is typically 4kB or smaller. The unit of granularity for S3 is the entire object or 5MB (minimum multipart upload size), whichever is smaller. The difference can be immense.

link

direwolf20 114 days ago

But this doesn't

link

jamesblonde 115 days ago

Files can be immutable if you have mutable metadata - but S3 does not have mutable metadata, so you can't rename a directory without a full copy of all its contents.

Immutable files can be solved by chunking them, allowing files to be opened and appended to - we do this in HopsFS. However, random writes are typically not supported in scaleout metadata file systems - but rarely used by POSIX clients, thankfully.

link

aforwardslash 115 days ago

Depends how you implement the fs layer on top of s3; as a quick example, I've done a couple of implementations of exactly that, where a file is chunked into multiple s3 objects; this allows for CoW semantics if required, and parallel upload/downloads; in the end it heavily depends on your use case

link