Hacker News new | ask | show | jobs
by andreasvc 3690 days ago
That's basically Chomsky's argument with the "colorless green ideas" sentence. If you put words together to form a sentence never seen before, supposedly a statistical model cannot help you. The thing is, a paper later showed that a simple Markov model is actually perfectly able to discriminate this grammatical sentence from an ungrammatical one. Novel and surprising sentences are never completely alien. They use familiar structures of the language, and combinations of words and other building blocks that we have seen before, and this is exploited when we analyze such sentences. Surprise and novelty are actually strongly related to statistics (cf. information theory).