Hacker News new | ask | show | jobs
by maister 1176 days ago
I've been thinking a lot about the ability of neural networks to develop understanding and wanted to share my perspective on this. For me it seems absolutely necessary for a NN to develop an understanding of its training data.

Take Convolutional Neural Networks (CNNs) used in computer vision, for example. One can observe how the level of abstraction increases in each layer. It starts with detecting brightness transitions, followed by edges, then general shapes, and eventually specific objects like cars or houses. Through training, the network learns the concept of a car and understands what a car is.

The same principle applies to Transformer networks in text processing. Instead of pixels, they process textual elements. Neurons in different layers learn to recognize complex relationships and understand abstract concepts.

1 comments

I mean, isn't this the whole point of large + deep NNs? To model complex relationships in data? It's odd so many people seem to deny this with GPT and try to trivialise what it does by saying, "it just predicts the next word".

This idea that GPT only works at the level of words and develops no deeper understanding of the concepts in language seems silly given its behaviour. And at the very least it's not what we observe from other NNs. As you point out a CNN will find deeper relationships and patterns between images, so it's only reasonable to assume a very large language model would find deeper relationships in text data.

The only difference here is that in comparison to other problems, text is how humans communicate and encode knowledge. The deeper relationships to be found in text is knowledge + reasoning.

I think we can say with some certainty that GPT models knowledge, the thing people are less sure about is if it learns to reason.

My take on this is that the fact you can ask it stuff that it couldn't know, but it can still "reason" to the correct answer suggests strong that it must have some ability to reason on the knowledge it's acquired.

Here's a really dumb example:

Me: Daisy likes to go swimming on the weekend, but last week she swore at her brother and has been grounded. How does Daisy feel?

GPT: It's possible that Daisy may be feeling disappointed or frustrated since she is unable to go swimming, which is an activity that she enjoys. She may also feel regretful or guilty for swearing at her brother and for the consequences that followed.

This isn't knowledge regurgitation. GPT doesn't know who is made up person is so it can't simply regurgitate something it was trained on. The only explanation for behaviour like this is that GPT has modelled human emotion and can reason about it.

Here's my abitrary line in the sand: if you give the prompt to a human, they could give a similar reply, but the prompt would also trigger other reactions such as:

* Who's Daisy?

* Why would Daisy do that?

* Daisy is rude.

etc. that imply the existence of some sort of abstract object on which relations and other facts can be plugged into. For me, the existence of that abstract object is "reasoning."

We do not know if GPT is capable of forming abstract objects in its network, and I do not think it is reasonable to infer that from its text output. In my non-expert opinion, it seems possible that the output can be achieved via knowledge regurgitation through the use of sentiment analysis, word correlations, and grammar classification.

So in this framing, it's not reasoning about Daisy nor hallucinating facts. It's regurgitating knowledge about the relationship between sentiment, words, and grammar. (An interesting experiment to run would be to change 'Daisy' to a random noun or even nonsense tokens to see what would happen).

You might argue that the ability to mechanically model that relationship counts as reasoning, and that's a stance I won't outright dismiss. However, it does seem strictly less powerful that mechanically modeling on top of abstract objects.

> This isn't knowledge regurgitation.

What makes you say that?

Why do you think it's "reasoning" an answer, instead of looking up that people being grounded makes them frustrated?

Right, in this scenario I think it's more that. Who Daisy is (or if Daisy even exists) is irrelevant toward formulating a response.

Which is still impressive!