Hacker News new | ask | show | jobs
by carboncopy 3769 days ago
What are these "weird formats" that keep getting referenced here? LZH? ARC? PKZIP? WordPerfect? Filesystems formatted in FAT16? All of these formats are still very much readable.

Some formats that are no longer usable in general, such as punch cards, were obsoleted consciously. And you might struggle to find a LaserDisc reader. Otherwise, we should be fine. Or am I missing something?

8 comments

Here's an example from Brian Moriarty's retrospective on the game "Loom" [1] in 2015, a mere 27 years after work on it began in 1988:

"The third disk here is the only known copy of the original design documents for Loom. It's an 800K Macintosh floppy employing a proprietary format readable only by a vintage Macintosh drive (thank you Apple). In preparing for this lecture, I obtained a dusty old Mac which had not been turned on since 1997. After reseating all the cables and boards I got it to boot and determined that the files on that disk are still intact and still fully readable. A few moments after making this happy discovery however, and before I could actually retrieve those files, the hard drive in the Mac crashed and died permanently. I'll recover the files eventually, but for now you'll have to rely on my failing memory for how Loom was conceived."

[1]: http://ludix.com/moriarty/loom.html (I transcribed the passage above from the video)

So I've got an HTML file and a PDF file, one was generated off the other. In the generated file, when I got to highlight some portions, I discover that it has been converted into a column format and it pastes wrong. It is easy to imagine that PDF being converted into a third format, and that conversion happening slightly wrong, from the aforementioned weirdness.

So imagine you have generations of digital files, converted from format to format to format. Why? Because software engineers are assholes who keep inventing new formats, for stupid goddamn reasons. A hundred generations from now, you'll be looking at copies of copies of copies which aren't just checksumable bit-wise copies but transcriptions, with transcription errors. If our descendants are lucky, all of those different versions will be preserved as well, but that just means that if you notice a possible transcription error, you'll have the opportunity to dig into two hundred year old character and file encodings to try to figure out what the original text was.

Which is not that different of a situation from trying to guess whether a scribe three copies ago misread the scribe four copies ago's atrocious f for a t.

Yeah converting from one format to another that isn't completely compatible might be an issue in the future. Even how to preserve websites isn't exactly intuitive because as you saw the conversion to PDF was faulty, and doing "file" > "save as" would not yield the same HTML because the browser modifies the DOM. We have to start using formats specifically designed for archiving such as WARC for webpages: http://www.digitalpreservation.gov/formats/fdd/fdd000236.sht.... (relevant and interesting site in general, run by the Library of Congress)
Yes, you're missing the hindsight bias. People using punch cards and laserdiscs weren't warned ahead of time that things stored on them would become unretrievable soon after. Nobody knows in advance which formats/media will be "obsoleted consciously".
The overlap between punch cards and other formats was decades.

"During the 1960s, the punched card was gradually replaced as the primary means for data storage by magnetic tape, as better, more capable computers became available. ... [P]unched cards were still commonly used for data entry and programming until the mid-1980s when the combination of lower cost magnetic disk storage ... made punched cards obsolete ..."[1]

Yes, I listed wikipedia as a primary source, deal with it.

[1] https://en.wikipedia.org/wiki/Punched_card

And remember, punch cards and laserdiscs are famous obsolete storage formats. What if you put your important information on an Iomega Jaz cassette, or stored it in a Bernoulli Box cartridge?
You might end up in a situation like this:http://openmoko.soup.io/post/124162230/Harald-LaF0rge-Welte-... fast forward another 10 years, and that one particular situation might be worse still.
Try MS Excel for Mac v1 - those spreadsheets are going to be unfun to open. Or MS Works spreadsheets without MS Works.
I have lots of Apple II AppleWorks files on floppies.

I'm not sure if I'll ever be able to open them. I don't even have a way of getting them off the floppy.

My dad still has a laser disc reader and 100s of discs. Sigh.
MUMPS.
MUMPS still sees a lot of active use today -- and MUMPS databases store data in plaintext, in ways that tend to be pretty well-documented and relatively easy to understand. EBCDIC, Betamax, laserdisc, Zip drives, and HD-DVD are probably better examples.
With all those positives, would you use it?
Didn't you say that MUMPS is an obsolete or generally weird format, not that it's something you wouldn't use?

But yes, I use it, both professionally and personally. Legacy MUMPS is not pleasant -- the VA's code looks like someone left their telegraph out in the rain -- but the language is astonishingly good at string-handling, mostly due to certain constructs that make it very hard to translate into normal languages.