Hacker News new | ask | show | jobs
by languagehacker 3923 days ago
TextBlob is just an easy-to-use wrapper for a number of more involved libraries, including NLTK and Pattern.

As with most things like it, if you're looking to shift off extremely unsophisticated NLP work to a junior developer, this is a good thing.

If you're an engineer focused in the NLP space, using this API would be like tying your hand behind your back. It introduces its own performance problems, and obscures a number of configurations that the APIs of the libraries it wraps expose. I also find its attitude towards object-orientation tends to obscure performance bottlenecks by hiding how much just-in-time computation occurs for a given string.

Also, I hate to admit this, but the Java/Scala NLP stack is beating out most Python NLP libraries these days. NLTK _just_ got Stanford CoreNLP's best-in-class dependency parser. It's been available in Java for years.

1 comments

If you're doing NLP in Python, there's no reason to use CoreNLP's parser from the NLTK wrapper. Communicating with the Java process over the file system or a socket introduces a tonne of unnecessary complications, slow-downs, invites encoding problems, etc.

spaCy's native Cython dependency parser is both faster and more accurate than CoreNLP.

The NP chunks example from the post:

    >>> from spacy.en import English
    >>> nlp = English()
    >>> doc = nlp(u'ITP is a two-year graduate program located in the Tisch School of the Arts. Perhaps the best way to describe us is as a Center for the Recently Possible.')
    >>> for np in doc.noun_chunks:
    ...   print(np.text)
    ... 
    ITP
    a two-year graduate program
    the Tisch School
    the Arts
    the best way
    us
    a Center
spaCy looks like a great product, but it is expensive.

edit: sorry, I just noticed that it is available for free under the AGPL 3 license.

Now 100% free under the MIT license. Things change by the hour in spacy world, :-).
It looks like a lot of good work went into spacy - I hope that you are sucessful monetizing it with the MIT license.