|
>> Most humans will be born, live and die inventing absolutely nothing, even those with the opportunity and resources to do so. I don't think that's right at all. I like to visit museums. You really get hit in the face with the unending creativity of the human mind and the variety of all that human hands have crafted over thousands of years across hundreds of cultures. I would go as far as to say that the natural state of the human mind is to create new things all the time. And mathematics itself was not created (invented or discovered) by one person, but by many thousands. In any case, it doesn't matter if one instance of the class of human minds hasn't invented anything, in the same way that it doesn't matter if one car can't do 80mph. It's indisputable that we have the capacity for some novelty, and generality, in our thinking. Maybe not every member of the species will achieve the same things, but the fact is that the species, as a species, has the ability to come up with never-before seen things: art, maths, tech, bad poetry, you name it. >> Lecun may disagree but some others like Hinton, Ilya and Norvig don't. I'm with LeCun and Bengio. There's a fair amount of confusion about what a "model" is in that sense: a theory of the world. There's no reason why LLMs should have that. Maybe a transformer architecture could develop a model of the world- but it would have to be trained on, well, the world, first. Sutskever's bet is that a model can be learned from text generated by entities that already have a world model, i.e. us, but LeCun is right in pointing out that a lot of what we know about the world is never transmitted by text or language. I can see that in my work: I work with planning, right now, where the standard thing is to create a model in some mathematical logic notation, that is at once as powerful as human language and much more precise, and then let a planning agent make decisions according to that model. It's obvious that despite having rich and powerful notations available there is information about the world that we simply don't know how to encode. That information will not be found in text, either. Sutskever again seems to think that, that kind of information, can somehow be guessed from the text, but that seems like a very tall order, and Transformers don't look like the right architecture. You need something that can learn hidden (latent) variables. Transformers can't do that. |
It does matter, depending on what claim you're making. We've not reached the upper bound of transformer ability. Until we clearly do, then it very much does matter.
>I'm with LeCun and Bengio. There's a fair amount of confusion about what a "model" is in that sense: a theory of the world. There's no reason why LLMs should have that.
See this is my problem with Lecun's arguments. He usually starts with the premise that it's not possible and works his way from there. If you disagree with the premise then there's very little left. "Well it shouldn't be possible" is not a convincing argument, especially when we really have very little clue on the nature of intelligence.
>Sutskever's bet is that a model can be learned from text generated by entities that already have a world model, i.e. us, but LeCun is right in pointing out that a lot of what we know about the world is never transmitted by text or language.
A lot of the world is transmitted by things humans don't have access to. Wouldn't birds that can naturally sense electromagnetic waves to intuit direction say humans have no model of the world ? Would they be right ? Nobody is trained on the world. Everything that exists is trained on small slices of it. A lot of the world is transmitted by text and language. And if push comes to shove then text and language is not the only thing you can train a transformer on.
>Sutskever again seems to think that, that kind of information, can somehow be guessed from the text, but that seems like a very tall order,
I don't think this is as tall an order as you believe
>and Transformers don't look like the right architecture. You need something that can learn hidden (latent) variables. Transformers can't do that.
But they do this all the time.
Transformer trained on only protein sequences learns biological structure and function - https://www.pnas.org/doi/full/10.1073/pnas.2016239118
Toy example on binary addition (transformer trained on inputs and outputs of addition sequences) learn an algorithm for it - https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mec...
Unless i'm misunderstanding what you mean by hidden variables, it's very clear a transformer is regularly learning not just the sequences themselves but what might produce them.