You're exhibiting shortsightedness when it comes to "source". If I give you some RPG [1] or maybe some ALGO 58 [2] source code are you going to just compile and run it no problem? How about some FLOW-MATIC [3]?
Yes, programming languages come and go, but I don't see how that matters. Some future historian will either have access to a working copy of xz or they will not. If they don't, and they want to implement it, having a copy of the source code is far better than anything else you could give them. Sure, future programming languages will be quite different, but humans will certainly be able to read and understand C code. If humanity has forgotten how to read C code (and lost all knowledge of it), how are they going to read this documentation you seem to prefer? Human languages come and go also..
Any argument you can make about historians being able to recover dead languages you can make the exact same argument for their ability to recover dead computer languages, and there is no better or more accurate specification than the actual code.
So let me add to my recommendation, in addition to a copy of the xz source code, include a plain text copy of any 'how to program in C' book, or just the wikipedia page for the C language. That is more than enough for them to construct a program that can decompress xz files, once they relearn how to read whatever long dead language the book is written in (Ancient Pre-Cataclysm Earth English for example).
> If humanity has forgotten how to read C code (and lost all knowledge of it), how are they going to read this documentation you seem to prefer?
Sure but are they going to remember something like, weird precedence rules (See: &), undefined behaviour, etc. Just because they want to reimplement a specific, small, program does not mean they want to relearn several languages. What you're saying could easily blow up from 'how to code C' to 'reading the GCC / Clang compiler source code to figure out how a specific UB was implemented, which the program in this specific case falls into', which I'm sure nobody wants to spend their weekend doing, implementing something like `xz` could simply be a midpoint in their destination, they don't want to spend weeks digging up COBOL. Have at least some consideration for the human element, jeez.
Documentation, specifically _mathematical_ documentation, is more fault tolerant than either psuedocode or actual code.
At any other time, I would agree with you, but where archivism is concerned, I do not.
Are you saying it would be easier to implement xz from mathematical documentation than from computer program? I don't think so. I tried (multiple times) to implement algorithms from "mathematical documentation" in academic papers, and it is usually very bad, there are always missing parts. If I had a choice, I'd choose ALGOL-58 over human-language description anytime.
>> What you're saying could easily blow up from 'how to code C' to 'reading the GCC / Clang compiler source code to figure out how a specific UB was implemented, which the program in this specific case falls into', which I'm sure nobody wants to spend their weekend doing
There will be many, many people that will gladly dig into the minutia and technical details of arcane hardware, especially when it means making progress towards filling in the historical record. This is already the case today, there is a working https://en.wikipedia.org/wiki/Colossus_computer reconstructed just because it was historically significant.
There are languages which achieve critical mass and stay, and languages which don't, and disappear.
RPG is still around, and IBM still sells it on their cloud. But the language is highly proprietary, so don't expect a cheap access to it.
ALGOL-58 is one of the languages which died; but ALGOL-68 is in the current debian repos, and would take under 30 seconds to install.
FLOW-MATIC has died, but COBOL is around and again, easily installable.
I think you are underestimating how much legacy software there is. For example, Fortran 77 is still actively used, and there are programs written in it every day. There is immense amount of programs written in C89. The support for those languages is likely to stay forever.
In general, I think this topic is very interesting. Imagine 1000 years have passed, and all the computers are running YEAR3000 architecture which is incompatible with all the software we have today. Archeologists discover a treasure trove of texts and binary files from 21th century internet. They know ASCII and English, but nothing else. What can they do?
You'd need to manually port this code to whatever language you are using now. But this should be doable -- the software has 6000 lines of very straightforward C89 code. It does not use any OS services, nor does it rely on UB or complex language features.
(2) Use it to boot Linux (the image is included in that webpage). This allows you to run Ubuntu from 2009 on your YEAR3000 architecture.
(3) If your archive contains repository snapshot from 2009 to your machine. You can now install and run all the 20th century software on your YEAR3000 computers. Congrats!
(4) The only thing missing is graphics support. Just run x11vnc (included in the Jaunty repo) over serial port (included in dmitry.gr's emulator). VNC protocol is simple and well specified.
... and that's how I'd bootstrap 20th century computing on 30th century infrastructure. Sure, it will take some effort, -- but this only needs to be done once, and running programs will be easy from there on.
https://en.wikipedia.org/wiki/Egyptian_hieroglyphs
https://en.wikipedia.org/wiki/Judaeo-Aragonese
https://en.wikipedia.org/wiki/Latin
Any argument you can make about historians being able to recover dead languages you can make the exact same argument for their ability to recover dead computer languages, and there is no better or more accurate specification than the actual code.
So let me add to my recommendation, in addition to a copy of the xz source code, include a plain text copy of any 'how to program in C' book, or just the wikipedia page for the C language. That is more than enough for them to construct a program that can decompress xz files, once they relearn how to read whatever long dead language the book is written in (Ancient Pre-Cataclysm Earth English for example).