| HN Mirror

Quoting a symposium with Chomsky talking about statistical AI: http://languagelog.ldc.upenn.edu/myl/PinkerChomskyMIT.html

"I think there have been some successes, but a lot of failures. The successes, to my knowledge at least, are those that integrate statistical analysis with some universal grammar properties, some fundamental properties of language; when they're integrated, you sometimes get results [...] On the other hand there's a lot work which tries to do sophisticated statistical analysis, you know bayesian and so on and so forth, without any concern for the uh actual structure of language, as far as I'm aware that only achieves success in a very odd sense of success. There is a notion of success which has developed in computational cognitive science in recent years which I think is novel in the history of science. It interprets success as approximating unanalyzed data."

The model that has become dominant in statistical AI -- positing a basic grammar that is strongly underconstrained and eliminating spurious analyses not through "universal grammar" (i.e. presupposed innate structures) but through learned parameters, would be something that Chomsky has been very much against; Simultaneously, work that models grammar with enough precision that you could derive predictions from it (e.g. Ed Stabler's grammar implementation) are seen as nice-to-have but not central to the undertaking of generative grammar.

And I think Chomsky put his thumb right on the difference in goals: Chomsky defines progress in linguistics as work that posits the right ("universal") structures, and argues that these are cognitively real and innate, whereas statistical AI is more interested in predicting useful things with structures that may or may not correspond to anything that is cognitively real.

To people nowadays, the whole notion of constrained "universal" models with few statistics versus underconstrained "statistical" models seems to be a very minor one, since today's statistical models have a lot of structure, and people doing generative grammar aren't totally opposed to using statistics or optimality theory to select most-plausible structures. But, back in the day, when the most expressive statistical models people used were HMMs [hidden Markov model - a probabilistic regular grammar] and PCFGs [probabilistic context-free grammars], the gap was much wider, whereas nowadays the models are a bit more similar while the goals are still different.