Hacker News new | ask | show | jobs
by linkydinkandyou 3914 days ago
Hmmm. I tried "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."

Didn't get it right.

(See https://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffal... )

Did better on "Colorless green ideas sleep furiously."

1 comments

But that Buffalo sentence is something on which people also stumble. A machine can be contrived to pass such a test case, while remaining a lousy parser compared to people.

The real problem is that the parser comes so swiftly to wrong conclusion and cheerfully presents it as a valid result.

It would look a lot better if it simply reported "error: cannot parse that". (Better yet, with reasons: "I cannot parse that because I get stuck on this specific ambiguity and it's just too much for me.").

Also, what about the possibility of multiple results? Language is ambiguous. If something has two parses, it's wrong to assert just one.

This thing has made no consideration whatsoever that even a single instance of "buffalo" in the sentence might conceivably be a verb, which flies in the face of almost any noun in English being verbable.

But it won't ever have trouble, because it's not trying to understand the sentence. It will tell you the most probable parsing of that sentence based on its model, whether or not it makes sense to a human.
People who manage parse the sentence also aren't trying understand it, except as far as "Buffalo" is a proper noun denoting a city, which can be used to form then phrases "Buffalo buffalo" == buffalo of/from/belonging to/related to Buffalo, and trying various combinations of interpreting "buffalo" as a noun (in various roles as subject, direct object and so on) or verb, and determining elided words such as "which" or "that" complementizers heading off phrases and embedded clauses.

It's almost purely syntactic reasoning. Searching these spaces of possibilities is something which, you would think, a "natural language parser" ought to be doing to earn its name.

Nobody actually knows what it means "to buffalo" something; it is not necessary to know. People solve the parse in spite of knowing that there is nothing to understand in the sentence.

"buffalo" can mean something like "bother" as an English verb (at least in American informal use), so the whole sentence as parsed in English does have a concrete mental image associated with it, in case that makes any difference.