Hacker News new | ask | show | jobs
by ltbarcly3 2564 days ago
Why would they need to recreate it solely based on 'documentation'? It is open source, the source code is the documentation. It seems just as likely that the source would survive as it is likely that some complete technical documentation would survive. Maybe they wouldn't be able to compile it (probably they would be able to compile it, I don't see why they wouldn't have some kind of computer emulator available), but it's better than any other kind of documentation you could provide.

The src/ tree of xz is 335k (compressed with gzip). If you are worried future digital historians won't be able to figure out the xz format, throw a copy of the gzip'd source onto every drive you store archives on, it would basically be free and would almost guarantee they would have a complete copy of exactly what they would need to decompress the files.

1 comments

You're exhibiting shortsightedness when it comes to "source". If I give you some RPG [1] or maybe some ALGO 58 [2] source code are you going to just compile and run it no problem? How about some FLOW-MATIC [3]?

Point being that computer languages come and go.

[1] https://en.wikipedia.org/wiki/IBM_RPG

[2] https://en.wikipedia.org/wiki/ALGOL_58

[3] https://en.wikipedia.org/wiki/FLOW-MATIC

Yes, programming languages come and go, but I don't see how that matters. Some future historian will either have access to a working copy of xz or they will not. If they don't, and they want to implement it, having a copy of the source code is far better than anything else you could give them. Sure, future programming languages will be quite different, but humans will certainly be able to read and understand C code. If humanity has forgotten how to read C code (and lost all knowledge of it), how are they going to read this documentation you seem to prefer? Human languages come and go also..

https://en.wikipedia.org/wiki/Egyptian_hieroglyphs

https://en.wikipedia.org/wiki/Judaeo-Aragonese

https://en.wikipedia.org/wiki/Latin

Any argument you can make about historians being able to recover dead languages you can make the exact same argument for their ability to recover dead computer languages, and there is no better or more accurate specification than the actual code.

So let me add to my recommendation, in addition to a copy of the xz source code, include a plain text copy of any 'how to program in C' book, or just the wikipedia page for the C language. That is more than enough for them to construct a program that can decompress xz files, once they relearn how to read whatever long dead language the book is written in (Ancient Pre-Cataclysm Earth English for example).

> If humanity has forgotten how to read C code (and lost all knowledge of it), how are they going to read this documentation you seem to prefer?

Sure but are they going to remember something like, weird precedence rules (See: &), undefined behaviour, etc. Just because they want to reimplement a specific, small, program does not mean they want to relearn several languages. What you're saying could easily blow up from 'how to code C' to 'reading the GCC / Clang compiler source code to figure out how a specific UB was implemented, which the program in this specific case falls into', which I'm sure nobody wants to spend their weekend doing, implementing something like `xz` could simply be a midpoint in their destination, they don't want to spend weeks digging up COBOL. Have at least some consideration for the human element, jeez.

Documentation, specifically _mathematical_ documentation, is more fault tolerant than either psuedocode or actual code.

At any other time, I would agree with you, but where archivism is concerned, I do not.

Are you saying it would be easier to implement xz from mathematical documentation than from computer program? I don't think so. I tried (multiple times) to implement algorithms from "mathematical documentation" in academic papers, and it is usually very bad, there are always missing parts. If I had a choice, I'd choose ALGOL-58 over human-language description anytime.
>> What you're saying could easily blow up from 'how to code C' to 'reading the GCC / Clang compiler source code to figure out how a specific UB was implemented, which the program in this specific case falls into', which I'm sure nobody wants to spend their weekend doing

There will be many, many people that will gladly dig into the minutia and technical details of arcane hardware, especially when it means making progress towards filling in the historical record. This is already the case today, there is a working https://en.wikipedia.org/wiki/Colossus_computer reconstructed just because it was historically significant.

I think you missed the implication that I didn't state explicitly, but figured was pretty clear:

> which I'm sure nobody wants to spend their weekend doing [if their original goal was to simply reconstruct xz].

There are languages which achieve critical mass and stay, and languages which don't, and disappear.

RPG is still around, and IBM still sells it on their cloud. But the language is highly proprietary, so don't expect a cheap access to it.

ALGOL-58 is one of the languages which died; but ALGOL-68 is in the current debian repos, and would take under 30 seconds to install.

FLOW-MATIC has died, but COBOL is around and again, easily installable.

I think you are underestimating how much legacy software there is. For example, Fortran 77 is still actively used, and there are programs written in it every day. There is immense amount of programs written in C89. The support for those languages is likely to stay forever.

In general, I think this topic is very interesting. Imagine 1000 years have passed, and all the computers are running YEAR3000 architecture which is incompatible with all the software we have today. Archeologists discover a treasure trove of texts and binary files from 21th century internet. They know ASCII and English, but nothing else. What can they do?

The answer is surprisingly simple:

(1) Write an emulator for an simple CPU, like an ARMv5. Here is a good one: https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...

You'd need to manually port this code to whatever language you are using now. But this should be doable -- the software has 6000 lines of very straightforward C89 code. It does not use any OS services, nor does it rely on UB or complex language features.

(2) Use it to boot Linux (the image is included in that webpage). This allows you to run Ubuntu from 2009 on your YEAR3000 architecture.

(3) If your archive contains repository snapshot from 2009 to your machine. You can now install and run all the 20th century software on your YEAR3000 computers. Congrats!

(4) The only thing missing is graphics support. Just run x11vnc (included in the Jaunty repo) over serial port (included in dmitry.gr's emulator). VNC protocol is simple and well specified.

... and that's how I'd bootstrap 20th century computing on 30th century infrastructure. Sure, it will take some effort, -- but this only needs to be done once, and running programs will be easy from there on.