| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by notimpotent 1483 days ago

My first thought upon reading this: what if DALL-E (or a similar AI) uncovers some kind of hidden universal language that is somehow more "optimal" than any existing language?

i.e. anything can be completely described in a more succinct manner than any current spoken language.

Or maybe some kind of universal language that naturally occurs and any semi-intelligence life can understand it.

Fun stuff!

5 comments

extr 1483 days ago

This is kind of already what's happening inside the NN. You can think of intermediate layers in the network as talking to each other in "NN-ease", that is, translating from one form of representation (encoding) to another. At the final encoder layer, the input is maximally compressed (for that given dataset/model architecture/training regime). The picture (millions of pixels) of the dog is reduced to a few bits of information about what kind of dog it is and how it's posed, what color the background is, etc.

However, optimality of encoding is entirely relative to the decoding scheme used and your purposes. Obviously a matrix of numbers representing a summary of a paragraph can be in some sense "more compressed" than the English equivalent, but it's useless if you don't speak matrices. Similarly, you could invent an encoding scheme with Latin characters that is more compressed than English, but it's again useless if you don't know it or want to take the time to learn it. If we wanted we could make English more regular and easier to learn/compress, but we don't, for a whole bunch of practical/real life reasons. There's no free lunch in information theory. You always have to keep the decoder/reader in mind.

link

sbierwagen 1483 days ago

Ithkuil (Ithkuil: Iţkuîl) is an experimental constructed language created by John Quijada.[1] It is designed to express more profound levels of human cognition briefly yet overtly and clearly, particularly about human categorization.

Meaningful phrases or sentences can usually be expressed in Ithkuil with fewer linguistic units than natural languages.[2] For example, the two-word Ithkuil sentence "Tram-mļöi hhâsmařpţuktôx" can be translated into English as "On the contrary, I think it may turn out that this rugged mountain range trails off at some point."[2]

https://en.wikipedia.org/wiki/Ithkuil

link

astrange 1483 days ago

That’s not possible - it’s like asking for a compression system that can compress any message.

All human languages are about the same efficiency when spoken, but of course this mainly depends on having short enough words for the most common concepts in the specific thing you’re talking about.

https://www.science.org/content/article/human-speech-may-hav...

And there can’t be a universal language because the symbols (words) used are completely arbitrary even if the grammar has universal concepts.

link

elil17 1483 days ago

There are a couple sci-fi short stories in the book "Stories of Your Life and Others" by Ted Chiang which explore the idea that highly advanced intelligences might create special languages which accommodate special thoughts which we cannot easily think.

link

jcims 1483 days ago

I think something like this is actually quite likely.

I’ve been wondering if there is a way to do psychological experiments on these large language models that we couldn’t do with a person.

link

julianbuse 1483 days ago

I imagine these would be very interesting, but not very applicable to humans (which I presume is the intended outcome). OTOH, since these language models are trained on human language and media, they might have some value. I'm quite split on which I think is more likely (I don't have any experience in ai/ml nor in psychology so what do I know).

link

jcims 1483 days ago

One example of an ’experiment’ would be to explore the latent space with random/procedurally generated prompts and do semantic analysis on the results to look for topics or sentiments to emerge.

My guess is that the current language models don’t have enough information in the training data to do this usefully today, but over time it seems potentially viable.

link