|
Nice writeup. A few comments: So you're just identifying NPs and VPs in a sentence? So lets say I run your program, and I get NPs "Instagram" and "Facebook", and the VP "acquired." The question is, who did what to whom? Did Facebook acquire Instagram, or did Instagram acquire Facebook? Second, I think you're way over-emphasizing the supposed slowness of CFG parsing. Yes, the complexity is O(n^3) in the length of the sentence, but in practice, n is usually small. Modern statistical PCFG parsers are fast. |
From what I understand, the average length of a typical written sentence is n = ~27. OK, this is small by itself, but Stanford Parser (lexicalized PCFG) I am using needs about 1 second to parse a sentence of this size. Imagine how slow that is on a computer time-scale by comparing it to string-length operation on the same sentence.
I do encounter many sentences that are as much as 100 words long (and reading them myself, find nothing wrong with them). At about four times the length, these take about a minute to parse!
I am trying to find information about speeds of other PCFG parsers, including Collins, Charniak, Berkeley, etc. I understand dependency parsers are faster but also generally lag in accuracy.