Hacker News new | ask | show | jobs
by frobozz 2390 days ago
> I do wonder wither digital or analogue formats are better able to survive into the distant future.

There are 5000 year old clay tablets we can still read.

There are centuries old documents on paper, vellum etc. that we can still read.

I personally have decades-old paper documents I can easily read, and a box of floppies I can't.

It's not just a problem of unreadable physical media, I have a database file on a perfectly readable HD that was generated by an application that is no longer available. I might be able to interrogate it somehow, but it won't be easy.

Digital formats and connectivity make LOCKSS easier, so that's a plus. There's less chance of a fire or flood or space-limited librarian destroying the last known copy. However, without archivists actively transforming content to new formats as required, it might only take a few decades before a lot of content starts to require a massive effort to read.

2 comments

Clay is the plastic of the ancient world.

Let's say the probability that: a single copy of a physical book survives 1,000 years, is found and is understood by an archaeologist, is pB and the probability that a single copy of a book on an SSD survives 1,000 years is found and understood by an archaeologist is pD. Even if pB is far larger than pD it could be the case that there might be so many more copies of single book held on SSDs thus making it more likely the book will survive via an SSD than a physical book. On the other hand the technology to recover data from SSDs might not exist in 1,000 years.

It could also be the case that each generation would copy these books onto new digital media providing an unbroken chain of copies. The oldest copy of the Iliad is Venetus.A which is from 1000AD (1000 years ago) despite the Iliad probably first being written down in 800BC (2800 years ago). It was copied from earlier copies of copies of copies.

I really don't know how this will play out and I've been unable to find research on how long SSD and flash memory based media survives especially if buried in a landfill.

* - If archaeologists exist in the future. The current push from the STEM boosters to defund and de-emphasize the humanities may result in a near-future without archaeologists or funded archaeological projects. Over 1,000 years the entire field could die.

> thus making it more likely the book will survive via an SSD than a physical book

Yes. That's what I mean by LOCKSS being easier.

> is found and is understood by an archaeologist,

There is a problem with merging these two probabilities.

The probability of finding a book is of course massively smaller than the probability of finding a digital copy.

The probability of understanding a book is so much greater than the probability of understanding a file on a disk.

This makes it more likely that the physical book will survive in a meaningful way.

> It could also be the case that each generation would copy these books onto new digital media

This is what I mean by archivists actively transforming the content. Regarding written content like the Iliad, copies and translations can be made centuries apart. Content in digital formats may need to be transformed whenever the application that reads it is discontinued.

Would an SSD even function after 1000 years? Unless sealed, I imagine ambient moisture would do a number inside the drive. The same is true for books of course, but we still have 1000 year old books that have lasted by sitting on a shelf in churches and temples, etc., without any specific care until recent history.

The nice part of a book in an apocalyptic scenario is that you can copy it even if you don't know the language. You don't need a special tool for this, only one capable of marking a surface. It wouldn't be fun or fast, but it's possible and it's what monks did for centuries. Would archeologists 1000 years from now be lucky enough to find a SATA cable too?

It doesn't really matter if the SSD as a whole still works, because after 1000 years you'll never recover the data via the normal interface. Modern MLC flash is often specified for less than 1 year data retention, and even SLC is unlikely to make it to 1000 years. Attempting to read it will only make things worse ("read disturb"). The best hope of saving the data is with some future nanotech that directly probes each floating gate transistor and counts the electrons, and reverse engineering all the error correction and wear leveling.
I would assume they would read the SSD not by powering it on and plugging it into to a computer but by disassembling it and physically imaging the physical structure. This would also bypass the all the write leveling infrastructure allowing them to recover deleted data. It reminds me of the current techniques of using x-rays to read writing on the odd scraps of paper used to bind a book [0].

[0]: "X-rays reveal 1,300-year-old writings inside later bookbindings" https://www.theguardian.com/books/2016/jun/04/x-rays-reveal-...

No one is proposing we use floppy disks.

Redundant, shared servers ARE a forever solution. Making sure your data is one one of the ones that makes it seems like a vastly easier proposition to me than writing data to clay tablets and trying to keep those from ending up in a dump somewhere.

What is the likelihood that historians a century or two hence will have an application capable of turning an ISO 32000-1 file into a human-readable text?

If we are talking about archaeologists, rather than historians, even ASCII and Unicode could be a challenge to work out.

Because those hundreds of years don't transpire in a glimpse. At some point in the middle there will be deprecated formats and new ones, and transcoders you can batch run. Sure it relies on intervention, but the upside is any/everyone else can copy the one persons work.

Yes we should learn from history, but we should also not assume that everything that happened before will happen the same way again, given how much of our world has changed.

> However, without archivists actively transforming content to new formats as required, it might only take a few decades before a lot of content starts to require a massive effort to read.
More effort than batch reading physical books and tablets in old languages?

You can reuse interfaces easier on data, and current ML could probably pull some of the weight of interpreting old data right now, not to mention what we have 50 years from now.

0.99999 at least.

Compare the capabilities of digital historians today to those 10- and 20-years ago respectively. It’s night and day.