Hacker News new | ask | show | jobs
by eddythompson80 330 days ago
There is nothing really special about speech as a form of communication. All animals communicate with each other and with other animals. Informational density and, uhhhhh, cyclomatic complexity might be different between speech and a dance or a grunt or whatever.
1 comments

I was referencing Wittgenstein's "If a lion could speak, we would not understand it." Wittgenstein believed (and I am strongly inclined to agree with him) that our ability to convey meaning through communication was intrinsically tied to (or, rather, sprang forth from) our physical, lived experiences.

Thus, to your point, assuming communication, because "there's nothing really special about speech", does that mean we would be able to understand a lion, if the lion could speak? Wittgenstein would say probably not. At least not initially and not until we had built shared lived experiences.

If we had a sufficiently large corpus of lion-speech we could build an LLM (Lion Language Model) that would “understand” as well as any model could.

Which isn’t saying much, it still couldn’t explain Lion Language to us, it could just generate statistically plausible examples or recognize examples.

To translate Lion speech you’d need to train a transformer on a parallel corpus of Lion to English, the existence of which would require that you already understand Lion.

Hmm I don't think we'd need a rosetta stone. In the same way LLMs associate via purely contextual usage the meaning of words, two separate data sets of lion and English, encoded into the same vector space, might pick up patterns of contextual usage at a high enough level to allow for mapping between the two languages.

For example, given thousands of English sentences with the word "sun", the vector embedding encodes the meaning. Assuming the lion word for "sun" is used in much the same context (near lion words for "hot", "heat", etc), it would likely end up in a similar spot near the English word for sun. And because of our shared context living in earth/being animals, I reckon many words likely will be used in similar contexts.

That's my guess though, note I don't know a ton about the internals of LLMs.

Someone more knowledgeable might chime in, but I don't think two corpuses can be mapped to the same vector space. Wouldn't each vector space be derived from its corpus?
It depends how you define the vector space but I'm inclined to agree.

The reason I think this is from evidence in human language. Spend time with any translator and they'll tell you that some things just don't really translate. The main concepts might, but there's subtleties and nuances that really change the feel. You probably notice this with friends who have a different native language than you.

Even same language same language communication is noisy. You even misunderstand your friends and partners, right? The people who have the greatest chance of understanding you. It's because the words you say don't convey all the things in your head. It's heavily compressed. Then the listener has to decompress from those lossy words. I mean you can go to any Internet forum and see this in action. That there's more than one way to interpret anything. Seems most internet fights start this way. So it's good to remember that there isn't an objective communication. We improperly encode as well as improperly decode. It's on us to try to find out what the speaker means, which may be very different from the words they say (take any story or song to see the more extreme versions of this. This feature is heavily used in art)

Really, that comes down to the idea of universal language[0]. I'm not a linguist (I'm an AI researcher), but my understanding is most people don't believe it exists and I buy the arguments. Hard to decouple due to shared origins and experiences.

[0] https://en.wikipedia.org/wiki/Universal_language

Hmm I don't think a universal language is implied by being able to translate without a rosetta stone. I agree, I don't think there is such a thing as a universal language, per se, but I do wonder if there is a notion of a universal language at a certain level of abstraction.

But I think those ambiguous cases can still be understood/defined. You can describe how this one word in lion doesn't neatly map to a single word in English, and is used like a few different ways. Some of which we might not have a word for in English, in which case we would likely adopt the lion word.

Although note I do think I was wrong about embedding a multilingual corpus into a single space. The example I was thinking of was word2vec, and that appears to only work with one language. Although I did find some papers showing that you can unsupervised align between the two spaces, but don't know how successful that is, or how that would treat these ambiguous cases.

That's a very good point! I hadn't thought of that. And that makes sense, since the encoding of the word "sun" arises from its linguistic context, and there's no such shared context between the English word sun and any lion word in this imaginary multilingual corpus, so I don't think they'd go to the same point.

Apparently one thing you could do is train a word2vec on each corpus and then align them based on proximity/distances. Apparently this is called "unsupervised" alignment and there's a tool by Facebook called MUSE to do it. (TIL, Thanks ChatGPT!) https://github.com/facebookresearch/MUSE?tab=readme-ov-file

Although I wonder if there are better embedding approaches now as well. Word2Vec is what I've played around with from a few years ago, I'm sure it's ancient now!

Edit: that's what I get for posting before finishing the article! The whole point of their researh is to try to build such a mapping, ve2vec!

And even, assuming the existence of a Lion to English corpus, it would only give us Human word approximations. We experience how lossy that type of translation is already between Human->Human languages. Or sometimes between dialects within the same language.

Who knows, we don't really have good insight into how this information loss, or disparity grows. Is it linear? exponential? Presumably there is a threshold beyond which we simply have no ability to translate while retaining a meaningful amount of original meaning.

Would we know it when we tried to go over that threshold?

Sorry, I know I'm rambling. But it has always been regularly on my mind and it's easy for me to get on a roll. All this LLM stuff only kicked it all into overdrive.

You might find https://www.lojban.org/files/why-lojban/whylojb.txt interesting. It is not really about your quote of Wittgenstein, but there is:

> In broad terms, the Hypothesis claims that the limits of the language one speaks are the limits of the world one inhabits (also in Wittgenstein), that the grammatical categories of that language define the ontological categories of the word, and that combinatory potentials of that language delimit the complexity of that world (this may be Jim Brown's addition to the complex Hypothesis.) The test then is to see what changes happen in these areas when a person learns a language with a new structure, are they broadened in ways that correspond to the ways the structure of the new language differs from that of the old?

That seems a bit extreme, given that a lion also has a mammal brain. I'd expect it to also think in terms of distinct entities that can move around in the environment and possibly talk about things like "hunger" and "prey".

I'd expect incomprehensible language from something that is wildly different from us, e.g. sentient space crystals that eat radiation.

On the other hand, we still haven't figured out dolphin language (the most interesting guess was that they shout 3D images at each other).

Observation has proven enough to understand the meaning of animal calls. People proved they correctly identified, for example, an distressed animal call for assistance, by playing it to their peers in the wild. They go look for the distressed animal. Other calls don't provoke the same reaction.
Analogies are always possible. I believe in the philosophical context though, understanding the meaning of something is not possible through analogy alone.

Reminds me of the quote:

“But people have an unfortunate habit of assuming they understand the reality just because they understood the analogy. You dumb down brain surgery enough for a preschooler to think he understands it, the little tyke’s liable to grab a microwave scalpel and start cutting when no one’s looking.”

― Peter Watts, Echopraxia

There was no analogy in there.
Correlate the vocalizations with the subsequent behavior. I believe this has been done for some species in certain situations.

Its also pretty much how humans acquire language. No one is born knowing English or Spanish or Mandarin.

Hmm I'm not convinced we don't have a lot of shared experience. We live on the same planet. We both hunger, eat, and drink. We see the sun, the grass, the sky. We both have muscles that stretch and compress. We both sleep and yawn.

I mean who knows, maybe their perception of these shared experiences would be different enough to make communication difficult, but still, I think it's undeniably shared experience.

That's fair. To me, the point of Wittgenstein's lion thought experiment though was not necessarily to say that _any_ communication would be impossible. But to understand what it truly meant to be a lion, not just what it meant to be an animal. But we have no shared lion experiences nor does a lion have human experiences. So would we be able to have a human to lion communication even if we could both speak human speech?

I think that's the core question being asked and that's the one I have a hard time seeing how it'd work.

Hmm, I'm finding the premise a bit confusing, "understand what it truly meant to be a lion". I think that's quite different than having meaningful communication. One could make the same argument for "truly understanding" what it means to be someone else.

My thinking is that if something is capable of human-style speech, then we'd be able to communicate with them. We'd be able to talk about our shared experiences of the planet, and, if we're capable of human-style speech, likely also talk about more abstract concepts of what it means to be a human or lion. And potentially create new words for concepts that don't exist in each language.

I think the fact that human speech is capable of abstract concepts, not just concrete concepts, means that shared experience isn't necessary to have meaningful communication? It's a bit handwavy, depends a bit on how we're defining "understand" and "communicate".

> I think the fact that human speech is capable of abstract concepts, not just concrete concepts, means that shared experience isn't necessary to have meaningful communication?

I don't follow that line of reasoning. To me, in that example, you're still communicating with a human, who regardless of culture, or geographic location, still shares an immense amount of shared life experiences with you.

Or, they're not. For example, an intentionally extreme example, I bet we'd have a super hard time talking about homotopy type theory with a member of the amazon rain forest. Similarly, I'd bet they had their own abstract concepts that they would not be able to easily explain to us.

I would say there's a difference between abstract and complex. A complex topic would take a lot of time to communicate mainly because you have to go through all the prerequisites. By abstract I mean something like "communicate" or "loss" or "zero"! The primitives of complex thought.

And if we're saying the lion can speak human, then I think it follows that they're capable of this abstract thought, which is what I think is making the premise confusing for me. Maybe if I change my thinking and let's just say the lion is speaking... But if they're speaking a "language" that's capable of communicating concrete and abstract concepts, then that's a human-style language! And because we share many concrete concepts in our shared life experience, I think we would be able to communicate concrete concepts, and then use those as proxies to communicate abstract concepts and hence all concepts?

Does a lion not know what it's like to be hungry? These parts of the brain are ancient. There is clearly a sliding scale in most experiences here from amoeba to fly to lion to human. Would you like to communicate with a girl who drinks tapioca milk tea? Clearly your life experiences are different so what's the point? Obviously gets harder, that's why we are discussing the possibility of using technology to make it easier.

Obviously it's impossible to communicate even 90% of human experience with lions or people with mental disabilities. But if a translation model increases communication even 1%, brings everybody up to the level of a Kevin Richardson it's a huge win E.g. A pair of smart glasses that labeling the mood of the cat. Nobody cares about explaining why humans wear hats to a lion and of course no explanation is better than being a old human who has worn hats for a variety of reasons.

I think it's unlikely you could make a LLM that gives a lion knowledge via audio only, but very possibly other animals