Hacker News new | ask | show | jobs
by syllogism 3923 days ago
If you're doing NLP in Python, there's no reason to use CoreNLP's parser from the NLTK wrapper. Communicating with the Java process over the file system or a socket introduces a tonne of unnecessary complications, slow-downs, invites encoding problems, etc.

spaCy's native Cython dependency parser is both faster and more accurate than CoreNLP.

The NP chunks example from the post:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
    >>> for np in doc.noun_chunks:
    ...   print(np.text)
    ... 
    ITP
    a two-year graduate program
    the Tisch School
    the Arts
    the best way
    us
    a Center
1 comments

spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3 license.

Now 100% free under the MIT license. Things change by the hour in spacy world, :-).
It looks like a lot of good work went into spacy - I hope that you are sucessful monetizing it with the MIT license.