Hacker News new | ask | show | jobs
by YeGoblynQueenne 2657 days ago
This is a mostly meaningless semantic distinction. I can ask you to give a synonym for "king" and you might suggest ruler, lord, or monarch. I can ask a word2vec model for a synonym for "king" and it will provide similar suggestions. What "understanding" of the words' meanings do you have that the model lacks? Be specific!

Why do you need to admonish me to be specific?

word2vec can only represent meaning by mapping words to other words. I have a human understanding of language that goes well beyond that. For example, I don't need to limit myself to synonyms of king- I can use circumlocution: "a king is the hereditary monarch leading a monarchist nation". word2vec can tell you which of those words are close to king, in its model, but it can't put together this simple sentence that describes their relation.

Not to mention I can generate and recognise who knows how many more representations of the concept "king" than word2vec can. I can draw you a cartoon of a king, or rather, an unlimited number of them, each different than the other. I can sing you a song about kings. I can write you a poem. I can dance you an interpretive dance about kings.

I don't know if you really think that word2vec is really as good as a human at representing meaning, but, just in case: it's not even close.

>> Again, who cares? If it passes a relevant "turing test", what does your quibble about the internal representation not being meaningful enough to you matter? Clearly there's an internal representation that's powerful enough to be useful. Just because you can't understand it at first glance doesn't make it not real.

What is that internal representation?

1 comments

> I don't know if you really think that word2vec is really as good as a human at representing meaning, but, just in case: it's not even close.

And I never said as such.

>Why do you need to admonish me to be specific?

Because I'm confident that for any particular definition of "understanding", the difference won't be relevant. Case in point, the one you provided. You're now claiming that a word2vec model doesn't have some "understanding" based on it being unable to demonstrate a specific skill (circumlocution/definition)[1]. All of your other objections follow the same general format. Because the word2vec model can't perform a skill that you can, its "intuitive" understanding of a concept must be lesser.

Following such an argument to its logical conclusion, you'd have to agree that you have a better intuitive understanding of language than a paralyzed person, because you can dance the word while they cannot. I doubt you actually hold such a belief.

So if the demonstration of an arbitrary skill isn't the marker of understanding, since that would be unfair to our quadriplegic linguist friends, perhaps performance on specifically relevant skills is how we should measure whether or not some model has the "understanding" you want. To be less abstract, given some embedding that we think has some "understanding" of some concept, we need to get the I/O right. If the same embedding can be placed in models that are wired up to interface with the world differently, but still perform well, perhaps the "understanding" is more than surface level.

Word to vec models clearly "understand" synonyms and antonyms and similar word relations. Word2vec/word embedding based models are also I believe still SoTA in automatic summarization and language translation tasks, although the machinery is fairly distinct from the original paper.

So what we have is representation that can

1. Show you which words are similar to which other words

2. Use that knowledge to summarize text

3. Use that knowledge to translate text to a different language

4. Be poked at by humans where we can find semantically meaningful clusters and patterns via tools like t-SNE.

>What is that internal representation?

For word2vec, for example, its that the vector space the words are in clusters similar words. For this model, its that the vector space clusters similar colored cards.

For complex neural models, who knows. On the one hand, it would probably be very useful if we could glean useful structure from the internal representation, and indeed people are working on that[2]. But on the other hand, they're demonstrably useful even if we don't have a perfect understanding of the structure. And given that we don't understand how and why we humans understand concepts, that's fine for now.

Of course, all of this assumes that "understanding" is even the right word to use. There's a good argument to be made that a neural network can and never will "understand" anything, because that's only something that self-aware entities can do. But again, that's mostly a semantic distinction. If we're discussing the efficacy of word-embedding models and whether or not the representation of concepts in those embeddings is real or just...happenstance, I'm not really sure what you're going for there, the entire question of things like self-awareness is irrelevant.

[1]: I apologize for over-anthropomorphizing an ML model here, but it's the best way of putting this I can think of.

[2]: https://distill.pub/2019/activation-atlas/

What you say above, about understanding etc, doesn't make a lot of sense, sorry to say.

>> For this model, its that the vector space clusters similar colored cards.

I mean- what is the representation you speak of in the previous comment. What data structure holds the model's understanding of M:tG colour? The source code is available online.

> I mean- what is the representation you speak of in the previous comment. What data structure holds the model's understanding of M:tG colour? The source code is available online.

The network's hidden layers. I can elaborate, but looking at your profile, you've implemented an LSTM before, so I shouldn't need to delve deeply into how that works. I'm honestly not sure where your confusion or aversion to the idea that much models can learn to encode semantically meaningful concepts is.

The concept of "color" is never explicitly encoded anywhere by a human. It infers clusterings based on the correlations between which cards are chosen. Unsurprisingly, given reasonable training data, those clusters form along useful boundaries in the game world, one of which is color. If you similarly passed a pack containing every card into the model, you'd likely get out what the model's opinion on the best limited card is. No one ever told it that, but based on the training data, the model "figures it out".

My comments on understanding can be summarized as such: either

1. You're of the opinion that nothing that isn't "strong AI" can have understanding, because understanding is some concept unique to conscious entities (or some reasonably similar opinion). This is an almost completely semantic argument, and isn't particularly interesting. Its an argument about definitions that avoids any actual useful academic questions.

2. You think that non-conscious entities can "understand" concepts, but deny that implicit understandings based on learned clusterings is "understanding". This is marginally more interesting, but wrong: if an implicit understanding can pass a "turing test" whereby I mean that the statistical/learned model can perform as well as whatever you're comparing against, whether it is a human or an expert system, at some task, the two things have the same understanding when confined to that domain.

In other words, sure saying a model doesn't "understand language" might be reasonable because language is multifaceted. But suggesting a model that outperform humans on the synonym portion of the LSAT doesn't understand synonyms is silly. Of course it does. Better than humans. Sure it can't express its understanding of synonyms as music or dance, but that's not because it lacks understanding of synonyms, that's because it lacks other basic faculties that we take for granted.

The question of whether or not you or I can introspect the model to see how its understanding is structured doesn't matter. I can't look inside your head to see how your understanding of language is structure. There's no ArrayList<WordDefinition> I can see in your mind. But I think anyone would agree that you and I both "understand" synonyms despite that lack of transparency. Why would you expect anything different from a statistical model?

>> My comments on understanding can be summarized as such: either

Please don't do this. Too many assumptions about what and how I think leave a bad taste.

Yes, a model that can identify synonyms accurately lacks human faculties, including understanding. That's what modern machine learning boils down to. There are many tasks we thought would require human intelligence or reasoning, that can, after all, be reduced to dumb classification. In other words, there is no need to claim "understanding" to explain the output of a classification model, just because a human can perform the same task _and_ can understand it.

As to the representation- that is the only thing that matters. If you want to claim a model represents a concept, you have to be able to show where in the model's structure that concept is represented. If there is a representation- where is it?

This response runs much to close to my position number one for me to have any more interest in continuing. This line of discussion. I'll just leave you again with one question

> If there is a representation- where is it?

Where is your understanding of language?

(HN thinks we've overdit it. This goes to your comment below).

But, I'm a human being. Why do I need to show you my representation to convince you I possess human understanding of language?

Conversely, to claim that a statistical model possesses understanding is a very strong claim that requires equally strong evidence. And since we can inspect a statistical model's representation- that is where the evidence should be sought.

Why does that matter? Is there any doubt as to whether I can understand language? The question is whether word2vec etc can.