|
One nice thing about digital data, as opposed to physical artefacts, is that you don’t need to keep digital data’s metadata attached to the data “at the hip.” Through the magic of cryptographic hash algorithms, you can just keep your data sets floating around “raw” (like in these torrents), and then, elsewhere, ascribe metadata to the hash of the content it is meant to annotate. Then, later, you can reassemble them in either order—either by first finding a data set, hashing it, and then looking up metadata in some metadata-hosting service; or by first browsing a catalogue of indexed metadata, finding out about a dataset that meets your needs, and then retrieving the data set by its hash. Which is to say: with digital data, library science (creating metadata and chains-of-custody and indexing them for search) and archiving (ensuring access to pristine artifacts over time) don’t need to happen at the same time, in the same place. There can be separate “artifact hosting” and “metadata library” services. (Which is especially helpful in contexts where private IP is involved—you can still keep in your metadata library, the metadata for a data-set you don’t have the rights to; and those with the rights can go get the data-set themselves.) |
This is especially true for research oriented files, where consumers are often unable or unwilling to maintain a functional metadata store, and do a lot of manual file handling. Saying "well, somebody could have set up a super-awesome metadata system that track this" doesn't magically make those resources exist.