| The Stanford parser is particularly slow --- it's in java, and it's written for research more than anything. The C&C CCG parser runs at about 60-80 sentences a second, although it gives either CCG constituents or dependencies -- so the output may take some interpretation. Shift-reduce dependency parsers are linear time, and are giving state-of-the-art results. My parser's currently a pain in the ass to install, as it hasn't really been released yet, but it does hundreds of sentences a second. Accuracy is state-of-the-art -- 92-93% depending on the beam width and the evaluation set (Stanford or MALT dependencies). https://github.com/syllog1sm/redshift/ . You'll want the develop branch. It's GPL licensed. It's implemented in Cython (i.e., almost all the code is Cython --- I'm not using it just for the speed critical bits), which would make it easy to work with if you're using Python. But, as I said...I don't claim it's currently fit for human consumption. A C++ implementation of the same algorithm is here: http://www.sutd.edu.sg/yuezhang.aspx . Note his papers too -- he did some of the important work on this line of research. The last few years of work in shift-reduce dependency parsing have been a bit of a break-through in parsing, imo. |
The choice of language is not what causes the slowness of the Stanford parser. It's the choice of search strategy, which trades-off speed for accuracy.
Shift-reduce dependency parsers are linear time, and are giving state-of-the-art results.
No, this is incorrect.
The choice of parsing logic (shift-reduce, dependency) and the search strategy (greedy, sometimes erroneously called "deterministic") are orthogonal. It's the greedy search strategy that leads to linear time performance.
The choice of logic determines the lower-bound (best-case) on parsing complexity. If you do exhaustive search for the exact solution of a shift-reduce dependency parser, it is worst-case exponential. In practice, you don't do exact search, and by using a beam search approximation you can get observed linear-time performance.
[edit: You can read my thesis if you are not familiar with what a parsing logic is.]
I am not aware of state-of-the-art results from greedy shift-reduce parsers. Do you mind sharing?