| We know how LLMs learn at the fundamental level. What we do not know is the actual dynamic process of encoding embeddings and their distributions. Your analogies about the PC and web browser are not correctly formulated, because in the case of the PC you talk about 'external components' (you should be talking about cpu arch, structure, digital components, interfaces, etc); in the case of the web browser, you should be talking about modules, code, etc. We do know how LLMs are laid out: layers, att heads, etc. So what we need to look at are the fundamental possibilities of the structure of LLMs, not how the weights are distributed. > > And we also know that human beings do not hold 'internal representations' like any AI system needs to. > Bold fucking claim. Got a source on that? Part of the sources are in the books I mentioned. Nonetheless, you can still fact-check and refute in an adult and serious manner, not in an disrespectful and arrogant way. If my claim sounded arrogant I apologize, but then as I already mentioned, my references back that claim. Regarding internal representations in the brain: I guess you are referring to areas of the brain being activated when a subject receives a stimuli, and this is tested through MRI. I would be cautious to causally relate stimuli to neuron activations, since you first need to know if the exact configuration of cell involved and their connections allow for such representation (which I think it is still not known -- again, AFAIK, the contrary seems to be the case). |
Yeah, no. I'm not walking that chain. If you want to, do it, but for now, I'm filing it as "has no evidence and knows it".
By now, there's plenty of works, up to and including direct neural interfaces. Utah arrays, Michigan arrays. Stab the brain, dump the spike trains, decode. You crack the manifold open by correlating to known stimuli using ML, and generalize from there to unknown stimuli. There is no need to "know the exact configuration", and few bother - you put your hardware into the part of the brain you want (top level map is consistent enough brain to brain), gather a set of reference points, and use them to anchor the rest of the decoding process.
Why use ML? Because you need a very expressive correlator to bridge the gap between known inputs and the products of whatever transformations the brain subjects them to before they show up in spike trains.
> So what we need to look at are the fundamental possibilities of the structure of LLMs, not how the weights are distributed.
And the fundamental possibilities are... what exactly? We know the I/O planes, we know the possible flow of information, now, what does that give us?
We know enough to prove that a transformer LLM can implement a Turing machine, the same way a CPU can implement a Turing machine. So an LLM is capable of performing arbitrary computation within its capacity. That's it. That's the upper bound.
What follows is: if you can represent "thinking" as a computational process, you can implement it with a Turing machine, and thus, an LLM can be made to think. That proves LLMs can think. But not that the existing ones do or don't! Because that's the entire thing about upper bounds!
We've looked at LLM architecture, and learned basically nothing about whether LLMs think, other than "it's not impossible". That's the actual "fundamental possibilities" we derived from knowing the architecture. One step above worthless. Oh fun.
(If thinking requires hypercomputation, then, nope. LLMs are out. Good luck proving that it does though.)