|
|
|
|
|
by gitaarik
382 days ago
|
|
I think it's not just token support, it's also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it. But that that will always be based on already existing data that is fed into the model. It can produce "new" stuff only by combining the "old" stuff in new ways, but it can't "think" of something entirety conceptionally new, because it doesn't really "think". |
|
Hierarchical optimization (fast global + slow local) is a precise, implementable notion of "thinking." Whenever I've seen this pattern implemented, humans, without being told to do so by others in some forced way, seem to converge on the use of verb think to describe the operation. I think you need to blacklist the term think and avoid using it altogether if you want to think clearly about this subject, because you are allowing confusion in your use of language to come between you and understanding the mathematical objects that are under discussion.
> It can produce "new" stuff only by combining the "old" stuff in new ways,
False premise; previously debunked. Here is a refutation for you anyway, but made more extreme. Instead of modeling the language task using a pre-training predictive dataset objective, only train on a provided reward model. Such a setup never technically shows "old" stuff to the AI, because the AI is never shown stuff explicitly. It just always generates new things and then the reward model judges how well it did. Clearly, the fact that it can do generation while knowing nothing, shows that your claim that it can never generate something new -- by definition everything would be new at this point -- is clearly false. Notice that as it continually generates new things and the judgements occur, it will learn concepts.
> But LLM's don't have an understanding of things, they are statistical models that predict what statistically is most likely following the input that you give it.
Try out Jayne's Probability Theory: The Logic Of Science. Within it the various underpinning assumptions that lead to probability theory are shown to be very reasonable and normal and obviously good. Stuff like represent plausibility with real numbers, keep rankings consistent and transitive, reduce to Boolean logic at certainty, and update so you never accept a Dutch-book sure-loss -- which together force the ordinary sum and product rules of probability. Then notice that statistics is in a certain sense just what happens when you apply the rules of probability.
> also having a understanding of certain concepts that allows you to arrive at new points like C, D, E, etc. But LLM's don't have an understanding of things
This is also false. Look into the line of research that tends to go by the name of Circuits. Its been found that models have spaces within their weights that do correspond with concepts. Probably you don't understand what concepts are -- that abstractions and concepts are basically forms of compression that let you treat different things as the same thing -- so a different way to arrive at knowing that this would be true is to consider a dataset with less parameters than there are items in the dataset and notice that the model must successfully compress the dataset in order to complete its objective.