| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by canjobear 2203 days ago

Chomsky was arguing that probability is useless for defining and studying grammaticality.

I'm not so sure. GPT-2 says

log P("Colorless green thoughts sleep furiously.") = -53.64797019958496

log P("Furiously sleep thoughts green colorless.") = -65.46656107902527

The ungrammatical one is lower probability. But those are famous sentences, and probably present in the training data, so let's try

log P("Colorless blue ideas hibernate angrily.") = -60.12953460030258

log P("Angrily hibernate ideas blue colorless.") = -70.02637100033462

1 comments

0xddd 2203 days ago

I think the more interesting result (and more relevant to Chomsky's point) would be to work in the other direction. If you instead produce a list of sentences with similar log probabilities you will see that it contains a mix of grammatical and ungrammatical utterances. This implies something more is needed to distinguish them.

link

canjobear 2203 days ago

> If you instead produce a list of sentences with similar log probabilities you will see that it contains a mix of grammatical and ungrammatical utterances.

Yes, Chomsky mentions this in a footnote. But as far as I know, it hasn't been tried with modern language models.

There's been some interesting work that tries to reproduce grammaticality judgments in terms of language model probability after controlling for length and lexical content. It turns out it works pretty well. For instance https://arxiv.org/pdf/1910.14659.pdf

link

0xddd 2203 days ago

I wish there were a freely available copy online, I could link, but the passage is at the end of chapter 2 of Syntactic Structures. It's not a footnote, but rather the crux of his argument, I believe:

> "... a structural analysis cannot be understood as a schematic summary developed by sharpening the blurred edges in the full statistical picture. If we rank the sequences of a given length in order of statistical approximation to English, we will find both grammatical and ungrammatical sequences scattered throughout the list; there appears to be no particular relation between order of approximation and grammaticalness. Despite the undeniable interest and importance of semantic and statistical studies of language, they appear to have no direct relevance to the problem of determining or characterizing the set of grammatical utterances. I think that we are forced to conclude that grammar is autonomous and independent of meaning, and that probabilistic models give no particular insight into some of the basic problems of syntactic structure."

I do think it's an important point for people to recognize. Scientific theories don't arise on their own out of large-scale statistical analyses. There is a lot of faith being put in deep learning methods these days, which are great for prediction, but not inference.

link

canjobear 2203 days ago

Thanks for pasting the whole thing. It's an interesting argument. The core empirical claim is

> If we rank the sequences of a given length in order of statistical approximation to English, we will find both grammatical and ungrammatical sequences scattered throughout the list; there appears to be no particular relation between order of approximation and grammaticalness.

It's totally not clear that this would be true with modern language models, after you control for (1) the length of the sentence and (2) the words in the sentence (as mentioned in the thing I linked above).

link

0xddd 2203 days ago

I will have to take a look at that paper. I didn't catch your edit before replying. It would certainly be worthwhile to verify that claim (or not) using the paper's model if I find some time. In any case, I think the underlying point is that these language models serve a purpose, but will not uncover an underlying structure for you or derive something like the phrase structure grammar proposed in Syntactic Structures. I may be extrapolating a bit based on other times I've seen Chomsky discuss this, though.

link