Hacker News new | ask | show | jobs
by mystified5016 460 days ago
Imagine I give you a text of any arbitrary length in an unknown language with no images. With no context other than the text, what could you learn?

If I told you the text contained a detailed theory of FTL travel, could you ever construct the engine? Could you even prove it contained what I told you?

Can you imagine that given enough time, you'd recognize patterns in the text? Some sequences of glyphs usually follow other sequences, eventually you could deduce a grammar, and begin putting together strings of glyphs that seem statistically likely compared to the source.

You can do all the analysis you like and produce text that matches the structure and complexity of the source. A speaker of that language might even be convinced.

At what point do you start building the space ship? When do you realize the source text was fictional?

There's many untranslatable human languages across history. Famously, ancient Egyptian hieroglyphs. We had lots and lots of source text, but all context relating the text to the world had been lost. It wasnt until we found a translation on the Rosetta stone that we could understand the meaning of the language.

Text alone has historically proven to not be enough for humans to extract meaning from an unknown language. Machines might hypothetically change that but I'm not convinced.

Just think of how much effort it takes to establish bidirectional spoken communication between two people with no common language. You have to be taught the word for apple by being given an apple. There's really no exception to this.

2 comments

I'm optimistic about this. I think enough pictures of an apple, chemical analyses of the air, the ability to arbitrarily move around in space, a bunch of pressure sensors, or a bunch of senses we don't even have, will solve this. I suspect there might be a continuum of more concept understanding that comes with more senses. We're bathed in senses all the time, to the point where we have many systems just to block out senses temporarily, and to constantly throw away information (but different information at different times.)

It's not a theory of consciousness, it's a theory of quality. I don't think that something can be considered conscious that is constantly encoding and decoding things into and out of binary.

A few GB worth of photographs of hieroglyphs? OK, you're going to need a Rosetta Stone.

A few PB worth? Relax, HAL's got this. When it comes to information, it turns out that quantity has a quality all its own.