|
My understanding is different, so please correct me / supply missing information. From what I understand, the average length of a typical written sentence is n = ~27. OK, this is small by itself, but Stanford Parser (lexicalized PCFG) I am using needs about 1 second to parse a sentence of this size. Imagine how slow that is on a computer time-scale by comparing it to string-length operation on the same sentence. I do encounter many sentences that are as much as 100 words long (and reading them myself, find nothing wrong with them). At about four times the length, these take about a minute to parse! I am trying to find information about speeds of other PCFG parsers, including Collins, Charniak, Berkeley, etc. I understand dependency parsers are faster but also generally lag in accuracy. |
Shift-reduce dependency parsers are linear time, and are giving state-of-the-art results. My parser's currently a pain in the ass to install, as it hasn't really been released yet, but it does hundreds of sentences a second. Accuracy is state-of-the-art -- 92-93% depending on the beam width and the evaluation set (Stanford or MALT dependencies).
https://github.com/syllog1sm/redshift/ . You'll want the develop branch. It's GPL licensed.
It's implemented in Cython (i.e., almost all the code is Cython --- I'm not using it just for the speed critical bits), which would make it easy to work with if you're using Python. But, as I said...I don't claim it's currently fit for human consumption.
A C++ implementation of the same algorithm is here: http://www.sutd.edu.sg/yuezhang.aspx . Note his papers too -- he did some of the important work on this line of research.
The last few years of work in shift-reduce dependency parsing have been a bit of a break-through in parsing, imo.