Hacker News new | ask | show | jobs
by Arjuna 4783 days ago
This is neat.

Shlomib, you might be interested in SHRDLU [1][2] if you are not aware of it. It was developed by Terry Winograd [3] for his dissertation [4] at MIT. It is a natural language understanding [5] parser that allows you to interact with a small world of 3D solids.

I think you will find the paper interesting, because it goes into detail on sentence structure and associated parsing.

Here is a sample dialogue with the program, taken from Wikipedia:

  Person:   Pick up a big red block.
  Computer: OK.
  Person:   Grasp the pyramid.
  Computer: I don't understand which pyramid you mean.
  Person    (changing their mind): Find a block which is taller than the one you are holding and put it into the box.
  Computer: By "it", I assume you mean the block which is taller than the one I am holding.
  Computer: OK.
  Person:   What does the box contain?
  Computer: The blue pyramid and the blue block.
  Person:   What is the pyramid supported by?
  Computer: The box.
  Person:   How many blocks are not in the box?
  Computer: Four of them.
  Person:   Is at least one of them narrower than the one which I told you to pick up?
  Computer: Yes, the red cube.
[1] https://en.wikipedia.org/wiki/SHRDLU

[2] http://hci.stanford.edu/winograd/shrdlu/

[3] https://en.wikipedia.org/wiki/Terry_Winograd

[4] http://hci.stanford.edu/winograd/shrdlu/AITR-235.pdf

[5] https://en.wikipedia.org/wiki/Natural_language_understanding

2 comments

Why does this seem so scary/amazing?

It's basically taking what computers already do really well today and adding a language "mask" onto it. Yet it still blows me away.

What's really amazing is how old SHRDLU is. It was developed back in the late 60's, early 70's.

Although... I guess you could look at it two ways: Be amazed at what SHRDLU could do in 1970, or be disappointed that, given that we had that in 1970, we don't have the "Star Trek Computer" yet in 2013.

SHRDLU is definitely amazing, especially given its age, but one's amazement is tempered a little bit (or maybe enhanced, depending on perspective) when you realize that it achieved what it did primarily through really great engineering rather that some fundamental insight about language. Since SHRDLU's world is so limited, Winograd was able to explicitly program every facet of its language understanding. Unsurprisingly, this approach is totally not scalable and this reveals a little about why we don't have fully human-like language programs.
I think people underestimate how explicitly-programmed human language is in humans. I'm starting to think that this might be the central problem in NLP right now.

Humans have good natural pattern-matching engines in their heads, but the entire body of syntax and vocabulary available to a person is the result of the memorization of a huge amount of text. I suspect the majority of people rarely ever develop truly novel words or phrases on their own (with the notable exception of Lewis Carroll). (Aside: in fact, this is exactly how "memes" work in the modern online sense; one person invents a novel word or phrase, and that is then parroted by a huge number of other people.)

I recently started work on an attempt to improve the classification of English vocabulary by grade level. I built a database using publicly-available sources, and the number of unique words that the average child has been exposed to by the 8th grade is mind boggling. One source cited 15,000 unique words and over a million words read annually.

Aside from the words themselves, children have also by that age memorized an even larger number of phrases, pieces of sentence structure, and full sentences.

I think that because we aren't able to enumerate everything we've memorized, we don't fully appreciate just how much data is stored in our heads. As a result, I think it's possible that computer science researchers have largely been chasing a ghost in terms of some kind of magical "understanding" of language; the answer to NLP might actually be to simply store and access a terabytes-sized data structure of vocabulary and phrases.

The kind of "programming" that you are describing is fundamentally different than what Winograd did, and that was my point. This learning from many examples is an instance of inductive inference, and the complexity involved is why modern NLP research (and you in your project) uses machine learning techniques with massive datasets -- this more closely mimics the way we naturally acquire language. Trying to hand engineer all those rules and dependencies and exceptions is prohibitively difficult, which is why we have Siri and not SHRDLU+.
Just because we memorize a whole lot (which I agree with) does not mean that language is likely to be "pre-programmed" in the way that SHRDLU follows explicit, exhaustive rules. Formulating such rules requires planning because they are brittle, and this does not seem compatible with the way language acquisition happens.

Also, after accepting the premise that humans exploit an enormous store of data in language use, there still remain very difficult questions about what kind of representations we have available, and how powerful the search and recombination mechanisms are.

Memory-based language processing exists for some time now, and while it is useful, it is certainly not the final answer to "the central problem in NLP" (whatever you define that to be, I'd suggest ambiguity resolution).

the answer to NLP might actually be to simply store and access a terabytes-sized data structure of vocabulary and phrases.

Isn't that effectively what google translate is doing? And it's results are... varied.

I get the impression that Google Translate is strictly doing it in a Bayesian sense. For example, the recent "he praised the iPad" debacle. [1][2]

[1] http://code.google.com/p/android/issues/detail?id=38538 [2] http://techcrunch.com/2013/01/04/google-now-and-google-trans...

Based on my experience, the hilarious thing about NLP is that it is easy for humans to generate easy to parse sentences like "Facebook acquires Instagram.", but if you are trying to parse a naturally flowing conversion, you rarely get easy examples like that. There is so much context in our conversations.
>> I recently started work on an attempt to improve the classification of English vocabulary by grade level.

I would be interested in this. Let me know if you plan to open this. What data sources are you using?

I'll provide an API for it, won't be ready for months though. It's not a big priority yet -- part of a larger project.

The data sources aren't that interesting. After trying for a while to find something already pre-compiled, I quit and resorted to Googling for phrases like, "9th grade spelling list", and aggregating the data from the results by hand. There are a bunch of sites for teachers and home educators and the like that include tables of vocabulary for various grades. It's tedious, but it works.

Since SHRDLU's world is so limited, Winograd was able to explicitly program every facet of its language understanding. Unsurprisingly, this approach is totally not scalable and this reveals a little about why we don't have fully human-like language programs.

That's a good point. It does lead one to wonder, however, if techniques inspired to SHRDLU could (or do) have application in domain-specific applications where the world is likewise restricted. Given the increases in raw horsepower available since SHURDLU was first developed, I find myself wondering if we couldn't do some pretty useful things today, using this approach.

Yes. For example, consider interlingual machine translation. Most systems today (like Google) use statistical MT that learns patterns from millions of examples. In interlingua, by contrast, you analyze the input sentence to form a language-independent representation of the sentence's meaning. Then you use that representation to generate a sentence in a new language.

As you might expect, this is basically impossible for wide-domain MT because we don't have unambiguous representations of the meaning of every sentence, and we don't necessarily know how to combine them, and there's a lot of non-compositional phrases, and on and on.

However, if we restrict ourselves to one small domain, interlingua can work. For example, the KANT system [1] is an interlingua that is built for translating technical manuals for Caterpillar products (bulldozers and so on). The input has to be written in a restricted subset of English (Caterpillar Technical English), but then you can analyze it exactly with hand-written rules, and produce exact output in the target language.

[1] http://www2.lti.cs.cmu.edu/Research/Kant/

Firstly, we have done similar things. For example, we have/had http://en.wikipedia.org/wiki/METEO_System for weather reports (use "machine translation weather reports" to google Scientific literature. Among others, that finds information that work is being done on a Croatian version of this). I think there have been successes in the medical field, too, but cannot find them.

However, this 'knowledge engineering' approach to AI has fallen somewhat out of fashion a bit in favourite of statistical methods (however, I don't think anybody does statistics 'from scratch'. For example, in NLP, you could try to statistically learn the definite articles in English, but hard-coding that 'the' is the only one will get you results faster.

Woah, that's eliza for CAD in steroids.