|
I’m not really sure what you’re getting at. Could you point to some papers exemplifying the kind of work that you’re thinking of? Of course there are lots of people training LLMs and other statistical models on EEG data, but that does not show that, say, GPT-5, is a good model of any aspect of human cognition. Chomsky, of course, never attempted to model the generation of natural language and was interested in a different set of problems, so LLMs are not really a competitor in that sense anyway (even if you take the dubious step of accepting them as scientific models). I certainly don’t agree with Norvig, but he doesn’t really understand the basics of what Chomsky is trying to do, so there is not much to respond to. To give three specific examples, he (i) is confused in thinking that Gold’s theorem has anything to do with Chomsky’s arguments, (ii) appears to think that Chomsky studied the “generation of language” (because he he’s read so little of Chomsky’s work that he doesn’t know what a “generative grammar” is), and (iii) believes that Chomsky thinks that natural languages are formal languages in which every possible sentence is either in the language or not (again because he’s barely read anything that Chomsky wrote since the 1950s). Then, just to make absolutely sure not to be taken seriously, he compares Chomsky to Bill O’Reilly! On point (iii), see http://www.linguistics.berkeley.edu/~syntax-circle/syntax-gr..., and the last complete paragraph of p. 145. |
If you believe that some of human cognition is linguistic (even if e.g. inner monologue and spoken language are just the surface of deeper more unconscious processes), then, yes, we might say LLMs can predictively model some aspects of human cognition, but, again, they are certainly not causal models, and they are not predictive models of human cognition generally (as cognition is clearly far, far more than linguistic).
* I avoid calling LLMs "statistical" because they really aren't even that. They are not calibrated, and including a softmax and log-loss in things doesn't magically make your model statistical (especially since ad-hoc regularization methods, other loss functions and simplex mappings, e.g. sparsemax, often work better and then violate the assumptions that are needed to prove these things are behaving statistically). LLMs really are more accurately just doing (very, very fancy and impressive) curve/manifold-fitting.