Hacker News new | ask | show | jobs
by ooooo00000 3711 days ago
Now it's ~55 GB uncompressed and as you said that's without images, audio, and more.
2 comments

I meant compressed. Text compresses really well.
It does indeed. However compression may cause problems for future beings discovering or understanding the data.
Keep in mind they have to understand the text and image encodings. The concept of glyphs themselves, even. Words. Language. Bytes (why 8 bits). Bits. Knowledge expressed as sentences. How to read the media. Compression would be just another thing.

There's a pretty big hurdle to comprehension even without compression because we are thinking in human terms. Imagine all of the prerequisite human knowledge you overlook to even approach the concept of an encyclopedic article describing something. Aliens might share knowledge by hitting each other with telepathic darts for all we know, having a completely different understanding and implementation of information, and words might require years of study on their part to comprehend. Even the golden record carries a lot of assumptions. What we know about the universe is not necessarily final, even with rudimentary things like information theory.

In the end it's a bunch of bytes, numbers really, on a disc. What are numbers, even? What if they have a totally different non-numeric system to quantify and explain their existence?

Think about finding an extraterrestrial storage device like this from our perspective. I'd safely predict 20 years before we even extract one byte of data, and a lot of that time would be arguing over it, probably. Although thinking about an alien Nobel ceremony for cracking the "extraterrestrial ceramic Wikipedia" is a pretty amusing thought.

One of my favorite short stories is That Alien Message: http://lesswrong.com/lw/qk/that_alien_message/

Its about humanity trying to reverse engineer a message from space that has very few bits. One of the morals is that humans would be able to decode crazy encodings provided enough time. And more data helps a lot. With 20 GB, even compressed, common patterns could quickly be found.

I don't believe for a moment a race advanced enough to recover the disc wouldn't understand it.

Me neither, to be clear. Just saying that compression is but one drop in the bucket.
You could do something like the Rosetta disk, where you just physically engrave the glyphs.

http://rosettaproject.org/disk/concept/

Of course then you need to work out what the language is actually saying, but plain "glyph retrieval" can be done with a desktop microscope and some time. An alien that operates in a roughly human way (has eyes, language, linear writing) would probably understand that it encodes meaning, even if it ends up like Linear A and undecipherable.

That's a very inefficient way to store the data. You couldn't fit all of wikipedia onto one of those disks like that. I would only do that for the instructions, and pictures would be better than glyphs. Or pictures next to words, so they have at least an idea on how to decrypt it.

Once you introduce a few words, the rest may be decipherable from context, especially with such a large corpus. E.g. certain words will cluster together often, and once you know one, you can guess at the others, which lets you guess at others, and so on.

> Think about finding an extraterrestrial storage device like this from our perspective.

Let's suppose the aliens already did this, but instead of a small disc, they wanted to send a message that no intelligent being could miss. Over a short span of 150,000 years, they redirected a bunch meteors into the moon in a pattern that encodes the last few digits of pi in a simplified resonant-fractal numbering system, which proves they know the angular momentum of the universe with fair precision. Clear and convincing evidence that would be visible throughout the solar system.

So yeah, I think no matter what we do to try to communicate with an alien intelligence, they would have to be very much like us (probably our own descendants or - who knows - ancestors) to even recognize the presence of the simplest message, much less decode it.

> the last few digits of pi

There is no last digit of pi.

Well of course not if your number system doesn't even exhibit fractal resonance in holistic projective encodings, given the angular momentum of the universe.
Imagine how disappointed the aliens will be to find our disc full of the boring old "discoveries" of a barely space-fairing species. Nothing to see here, move along.
Nah, the aliens are bound to have tons of post-doc archaeologists looking for faculty positions.
Besides, you have all the character bios for the full Marvel universe(s)...

I'm only half being sarcastic, as I've gotten lost in some of those articles trying to figure out who someone is... while I'm not sure of the encyclopedic value of those articles, they definitely have some entertainment value. For that matter, it may be worth clarifying fictional characters from those that are historic, or based on historic events.

One concept that always got me is the premise of an alien culture without sarcasm. They would have a very hard time with human culture/history.

Perhaps, but surely some compression is reasonable. You could shorten common words to much shorter sequences of bits without losing any information.

Any civilization capable of retrieving the disk should understand enough about information theory to undo compression. Instructions can also be explained in detail with pictures at the beginning, uncompressed.

That's an interesting question. I don't know if Lincos, which is the most conspicuous attempt at creating a self-explanatory language for aliens, included any kind of compression. Self-explanatory compression seems difficult to achieve, though maybe once you have the motion of equality, you can start giving examples of equivalent plain and compressed texts.
Then any damage to these instructions will be critical.
Copies of it could be distributed throughout the data. However I'm not sure what tolerance to damage is expected. How long is this disc supposed to last? How much damage will occur in that time? What's the tradeoff between compressing the data and it's expected lifespan?
Encoding choice should cause problems too.
So go with the four-years-ago version.