Hacker News new | ask | show | jobs
by LanguageGamer 3696 days ago
According to the paper linked by the original announcement [1], the parser scores 94.41% for unlabeled attachment on the Wall Street Journal corpus [2], a parsed and labeled data set of 30 million words.

This corpus is a standard for NLP research on english syntax, but I think its worth remembering there is a great deal of disagreement among linguists about what the syntax of english is and what the lexical categories are.

[1] http://googleresearch.blogspot.com/2016/05/announcing-syntax... [2] https://catalog.ldc.upenn.edu/LDC2000T43