|
|
|
|
|
by comp_throw7
538 days ago
|
|
> We do understand how they work, we did build them. The mathematical foundation of these models are sound. The statistics behind them are well understood. We don't understand how they work in the sense that we can't extract the algorithms they're using to accomplish the interesting/valuable "intellectual" labor they're doing. i.e. we cannot take GPT-4 and write human-legible code that faithfully represents the "heavy lifting" GPT-4 does when it writes code (or pick any other task you might ask it to do). That inability makes it difficult to reliably predict when they'll fail, how to improve them in specific ways, etc. The only way in which we "understand" them is that we understand the training process which created them (and even that's limited to reproducible open-source models), which is about as accurate as saying that we "understand" human cognition because we know about evolution. In reality, we understand very little about human cognition, certainly not enough to reliably reproduce it in silico or intervene on it without a bunch of very expensive (and failure-prone) trial-and-error. |
|
I think English is being a little clumsy here. At least I’m finding it hard to express what we do and don’t know.
We know why these models work. We know precisely how, physically, they come to their conclusions (it’s just processor instructions as with all software)
We don’t know precisely how to describe what they do in a formalized general way.
That is still very different from say an organic brain, where we barely even know how it works, physically.
My opinions:
I don’t think they are doing much mental “labor.” My intuition likens them to search.
They seem to excel at retrieving information encoded in their weights through training and in the context.
They are not good at generalizing.
They also, obviously, are able to accurately predict tokens such that the resulting text is very readable.
Larger models have a larger pool of information and that information is in a higher resolution, so to speak, since the larger better preforming models have more parameters.
I think much of this talk of “consciousness” or “AGI” is very much a product of human imagination, personification bias, and marketing.