|
|
|
|
|
by shmageggy
4783 days ago
|
|
SHRDLU is definitely amazing, especially given its age, but one's amazement is tempered a little bit (or maybe enhanced, depending on perspective) when you realize that it achieved what it did primarily through really great engineering rather that some fundamental insight about language. Since SHRDLU's world is so limited, Winograd was able to explicitly program every facet of its language understanding. Unsurprisingly, this approach is totally not scalable and this reveals a little about why we don't have fully human-like language programs. |
|
Humans have good natural pattern-matching engines in their heads, but the entire body of syntax and vocabulary available to a person is the result of the memorization of a huge amount of text. I suspect the majority of people rarely ever develop truly novel words or phrases on their own (with the notable exception of Lewis Carroll). (Aside: in fact, this is exactly how "memes" work in the modern online sense; one person invents a novel word or phrase, and that is then parroted by a huge number of other people.)
I recently started work on an attempt to improve the classification of English vocabulary by grade level. I built a database using publicly-available sources, and the number of unique words that the average child has been exposed to by the 8th grade is mind boggling. One source cited 15,000 unique words and over a million words read annually.
Aside from the words themselves, children have also by that age memorized an even larger number of phrases, pieces of sentence structure, and full sentences.
I think that because we aren't able to enumerate everything we've memorized, we don't fully appreciate just how much data is stored in our heads. As a result, I think it's possible that computer science researchers have largely been chasing a ghost in terms of some kind of magical "understanding" of language; the answer to NLP might actually be to simply store and access a terabytes-sized data structure of vocabulary and phrases.