|
This is not a solvable problem without technological continuity, or some unimaginably smart technology we can't imagine today. If you found a mysterious archive object and had no idea what it was - CD-R, hard drive, SSD, whatever - not only would you have to reinvent an entire hardware reader around it, you would also have to work out the file structure, extract the data (some of which could be damaged), and reverse engineer the container file formats and the data structures inside them. If you got all of that right, you'd eventually be able to start trying to translate the content of the text, audio, images, videos (how many compression formats are there?) into something you could understand. A much more advanced civilisation would struggle with making a cold start on all of that. In our current state, we'd get nowhere if we didn't already have some records explaining where to begin. |
1. Even if the CD-R has been crushed and shattered you could use a modern and cheap microscope to read continuous pits and lands off the disk [0,1]. It would be clear to anyone familiar with information theory how to translate the pits and lands to a series of set of arbitrary symbols which encode data.
2. This data would at first be meaningless. However the mathematical relationships of a simple error correcting code would stand out. This would allow them recover corrupted data. Once the error correcting code was stripped out they have a transcript of the raw data.
3. They would notice a pattern in the data. There would be long high entropy regions and then very short low entropy regions. They would probably notice that some of the low entropy regions had every 8-th bit set to zero (ASCII) and if taken in 8-bit chunks these regions had the roughly the same number of symbols as in the latin alphabet. If they were familiar with English they might quickly decode these regions using letter frequency correspondence with another English text.
4. The high entropy regions would be far harder to decode. However these future archaeologists would be faced with the obvious data patterns of frames of an MP3. Decoding the first MP3 would be a serious project involving many institutions over many years but once it was done it would allow the decoding of all artifacts that use the MP3 and related encoding formats. Possibly someone would find a "rosetta file" [2], a disk that contained both a .wav file and an encoded MP3 of the same song. More likely someone would find an MP3 player and then reverse engineer the decoding algorithm.
[0]: "Being able to see the tracks and bits in a CD-ROM" https://superuser.com/questions/870776/being-able-to-see-the...
[1]: "CD-ROM Under the Microscope" https://www.youtube.com/watch?v=RZUxemOE07Q
[2]: https://en.wikipedia.org/wiki/Rosetta_Stone