|
|
|
|
|
by gilesc
5589 days ago
|
|
The best toolkits are probably in Java: -Stanford's Tagger, Parser, and NLP Core -Apache OpenNLP -Lingpipe Many smaller components are made to be compatible with IBM UIMA (of Watson fame), so they are able to be integrated into a pipeline somewhat easily. For examples of this in biomedical TM, see http://u-compare.org/ . People will kill me for saying this, but truly: Python's performance isn't adequate for large-scale text mining, _especially_ if you want to do deep/full parsing. Shallow parsing as shown in this package's demo is more feasible. I personally find NLTK convoluted, but in its favor, it does have readers for a TON of corpora, which is really nice. |
|