| HN Mirror

Sure, what do you want to know?

I currently work on synbio × web archival.

Some of us are cooking up futuretech aimed at storing all of IA (archive.org) in a shoebox. Others are working on putting archival tools in more normal web users' hands, and making those tools do things that people tend to value more in the short-term, like help them understand what they're researching, rather than merely stash pages.

My ambitions for web archives are outsized compared to other archivists, but I'm fine with that. I'm looking beyond web archives as we currently understand them toward web archives as something else that doesn't quite exist yet: everyday artefacts, colocated and integrated with other web technology to an extent that they serve in essential sensemaking, workflow, and maybe security roles.

Right now, some obvious, pressing priorities are (a) preserving vastly more content and (b) doing more with the archives themselves.

A: The overwhelming majority of born-digital content is lost within a far narrower time-slice than would admit preservation at current rates, and data growth is accelerating beyond the reach of conventional storage media. So, for me, the world's current largest x is never the true object of my desire. I'm after a way to hold the world that is and the world to come.

Ideally, that world to come is one where lifelong data stewardship of everything from your own genome to your digital footprint is ubiquitously available and loss of information has been largely rendered optional.

This, of course, requires magic storage density that simply defies fundamental limitations of conventional storage media. I'm strongly confident that we're getting early glimpses of the first real Magic contenders. All lie outside, or on the far periphery of, the evolutionary tree that got us the storage media we have today. For instance, I'm running an art exhibition that involves encoding all the works on DNA.

B: Distributed archival that comes almost as naturally as browsing is well within reach, and with that comes some very new potential for distributed computation on archives. One hand washes the other.

One important thing to realize here is that, in many cases, you can name a very small handful of individuals as the reason why current archival resources exist. GPT-3 is cracking the surface by training on data produced by one guy named Sebastian, for instance.

…i'm sorta tired and have to respond to something about every twitter snapshot since June being broken, though, so I'll pick this back up later.