| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xigency 3692 days ago

Really, the mechanism of all these parsers, including SyntaxNet, is the same in that they use statistical training data to set up a neural network. Here's a paper on the Stanford CoreNLP parser, which you can compare with Google's paper: http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf

So, really all of the above parsers are weak in that they only output a single best parsing, when in reality sentences can have more than one valid structure, the principal example being the second sentence you've provided. I don't think Google's model has a better sense of humor than the others, no. I anticipate that they all have used relatively similar training data.

However, there is probably a trivial way to get the second sentence to parse as

      Subject --- Verb --- Object
     Noun       Verb   Article  Noun
      |   \       |     |        |
    Fruit flies  like   a      banana .

and that is to provide training data with more occurrences of ...

  > N{Fruit flies} V{like} honey. 
  > N{Fruit flies} V{like} sugar water.

than occurrences of

  > A plane V{flies} PREP{like} a bird.

The more sentences using simile that the parser finds, the less likely the neural net is to consider 'like' as a verb. It's also impacted by all of the uses of [flies like].

That's the nature of statistical language tools.

The stock parser debuted here gives the same answer as CoreNLP, by the way.

    flies VBZ ROOT
     +-- Fruit NNP nsubj
     +-- like IN prep
     |   +-- banana NN pobj
     |       +-- a DT det
     +-- . . punct

So much for Parsey McParseface's sense of humor.