|
The article talks about referring to resources by using URLs containing opaque ID numbers versus URLs containing human-readable hierarchical paths and names. They give examples like bank accounts and library books. This problem about naming URLs is also present in file system design. File names can be short, meaningful, context-sensitive, and human-friendly; or they can be long, unique, and permanent. For example, a photo might be named IMG_1234.jpg or Mountain.jpg, or it can be named 63f8d706e07a308964e3399d9fbf8774d37493e787218ac055a572dfeed49bbe.jpg. The problem with the short names is that they can easily collide, and often change at the whim of the user. The article highlights the difference between the identity of an object (the permanent long name) versus searching for an object (the human-friendly path, which could return different results each time). For decades, the core assumption in file system design is to provide hierarchical paths that refer to mutable files. A number of alternative systems have sprouted which upend this assumption - by having all files be immutable, addressed by hash, and searchable through other mechanisms. Examples include Git version control, BitTorrent, IPFS, Camlistore, and my own unnamed proposal: https://www.nayuki.io/page/designing-a-better-nonhierarchica... . (Previous discussion: https://news.ycombinator.com/item?id=14537650 ) Personally, I think immutable files present a fascinating opportunity for exploration, because they make it possible to create stable metadata. In a mutable hierarchical file system, metadata (such as photo tags or song titles) can be stored either within the file itself, or in a separate file that points to the main file. But "pointers" in the form of hard links or symlinks are brittle, hence storing metadata as a separate file is perilous. Moreover, the main file can be overwritten with completely different data, and the metadata can become out of date. By contrast, if the metadata points to the main data by hash, then the reference is unambiguous, and the metadata can never accidentally point to the "wrong" file in the future. |
Rather, everything would automatically be ingested, collated, categorized, and (of course) searchable by a wide range of metadata. Much of it would be automatic, but it would also support hand-tagging files with custom metadata, like project or event names, and custom "categorizers" for more specialized file types.
Depending on the types of files, you could imagine rich views on top -- like photos getting their own part of the system with time-series exploration tools, geolocation, and person-tagging with face recognition, or audio files being automatically surfaced in a media library, with heuristics used to classify by artist, genre, etc. But these views would be fundamentally separate from the underlying data, and any mutations would be stored as new versions on top of underlying, immutable files, making it easy to move things between views or upgrade the higher level software that depended on views.
This was years ago, and I never got around to doing any of that (it would've been a massive project that likely would've fallen flat on its face). And now, in a roundabout kind of way, we've ended up with cloud-based systems that accomplish a lot of what I had imagined. I'd go so far as to say that local filesystems are quickly becoming obsolete for the average computer-user, especially those who are primarily on phones and tablets. It's a lot more distributed across 3rd party services than what I had in mind, but that at least makes it "safer" from being lost all at once (despite numerous privacy concerns).