Hacker News new | ask | show | jobs
by nayuki 3172 days ago
The article talks about referring to resources by using URLs containing opaque ID numbers versus URLs containing human-readable hierarchical paths and names. They give examples like bank accounts and library books.

This problem about naming URLs is also present in file system design. File names can be short, meaningful, context-sensitive, and human-friendly; or they can be long, unique, and permanent. For example, a photo might be named IMG_1234.jpg or Mountain.jpg, or it can be named 63f8d706e07a308964e3399d9fbf8774d37493e787218ac055a572dfeed49bbe.jpg. The problem with the short names is that they can easily collide, and often change at the whim of the user. The article highlights the difference between the identity of an object (the permanent long name) versus searching for an object (the human-friendly path, which could return different results each time).

For decades, the core assumption in file system design is to provide hierarchical paths that refer to mutable files. A number of alternative systems have sprouted which upend this assumption - by having all files be immutable, addressed by hash, and searchable through other mechanisms. Examples include Git version control, BitTorrent, IPFS, Camlistore, and my own unnamed proposal: https://www.nayuki.io/page/designing-a-better-nonhierarchica... . (Previous discussion: https://news.ycombinator.com/item?id=14537650 )

Personally, I think immutable files present a fascinating opportunity for exploration, because they make it possible to create stable metadata. In a mutable hierarchical file system, metadata (such as photo tags or song titles) can be stored either within the file itself, or in a separate file that points to the main file. But "pointers" in the form of hard links or symlinks are brittle, hence storing metadata as a separate file is perilous. Moreover, the main file can be overwritten with completely different data, and the metadata can become out of date. By contrast, if the metadata points to the main data by hash, then the reference is unambiguous, and the metadata can never accidentally point to the "wrong" file in the future.

2 comments

A long time ago, around when I was first taking systems programming courses, I had this vision for a filesystem and file explorer that would do exactly what you say. I imagined an entire OS without any filepaths for user data (in the traditional, hierarchical sense). My opinion (both now and back then) was that tree structures as a personal data filing system almost always made more of a mess than it actually solved. Especially for non-techies.

Rather, everything would automatically be ingested, collated, categorized, and (of course) searchable by a wide range of metadata. Much of it would be automatic, but it would also support hand-tagging files with custom metadata, like project or event names, and custom "categorizers" for more specialized file types.

Depending on the types of files, you could imagine rich views on top -- like photos getting their own part of the system with time-series exploration tools, geolocation, and person-tagging with face recognition, or audio files being automatically surfaced in a media library, with heuristics used to classify by artist, genre, etc. But these views would be fundamentally separate from the underlying data, and any mutations would be stored as new versions on top of underlying, immutable files, making it easy to move things between views or upgrade the higher level software that depended on views.

This was years ago, and I never got around to doing any of that (it would've been a massive project that likely would've fallen flat on its face). And now, in a roundabout kind of way, we've ended up with cloud-based systems that accomplish a lot of what I had imagined. I'd go so far as to say that local filesystems are quickly becoming obsolete for the average computer-user, especially those who are primarily on phones and tablets. It's a lot more distributed across 3rd party services than what I had in mind, but that at least makes it "safer" from being lost all at once (despite numerous privacy concerns).

Part of that is kind of what Apple has been going for the past couple of years with macOS, even though they haven't gone all onboard by removing the hierarchical part (since there is so much legacy software and users would revolt).

A new user profile will come with a prominent "All my files" live search shortcut that just shows all your files in a jumble sorted by when you last used them. Then they expect you to search and filter through them by metadata (which is automatically extracted/indexed by Spotlight). Then you can save these searches/filters as saved searches which are live-updating virtual folders.

If you were new to modern macOS(and iOS with the Files app) you might end up with something similar. Applications dump things in the main Documents folder(with user chosen names, but those are necessary metadata). You can then tag items with various labels(essentially adding more metadata), and everything is searchable through spotlight of the search function of our file manager using the user-given name, tags, and metadata(documents edited today, Pages files).

Photos and videos are managed entirely in the photos app, and organised almost exactly according to your suggested categories(literally called memories (for events), places, people). iTunes handles audio files automatically(you can sync your own files into apple music, where they're categorised in the same way as any other music).

As I understand it, APFS also handles copying and modifying in a similar way to your description, where a copy of a file is treated as a mutation of the previous version.

Everything is even synced through iCloud to all your devices, with all macOS devices keeping a rather complete copy, unless they run out of disk space.

This would require someone to have their first experience of computing in the modern Apple ecosystem(literally iOS 11 and up) to avoid preconceptions about filesystems, since traditional folders are still supported, but it's possible.

One thing I'd love to see, in conjunction with this, is some kind of MVCC with snapshot transactions on filesystem level. So you don't really mutate files - you create new versions of them, and then old versions get GC'd eventually if nothing references them (which may not be the case if you e.g. have a backup).

Problem is, our existing file I/O APIs are very much centered around the notion of mutable files, and globally shared state with no change isolation.