Hacker News new | ask | show | jobs
by MojoJolo 4783 days ago
For some NLP, I really suggest using OpenNLP (http://opennlp.apache.org/) from Apache. It has libraries that can be trained to do different NLP tasks like sentence splitting, tokenizers, POS tagging, and document classification. I still didn't manage to use all of them but in my experience, it's very easy to use. Documentation is good too!
1 comments

I'm a fan of OpenNLP as well, although I haven't done a lot of performance evaluation around it yet. Apache Stanbol[1] is also a very interesting project, which leverages OpenNLP (among other things) for doing semantic entity extraction from text.

Also, FWIW, I wrote an article[2] a while back, focusing on Open Source NLP tools. It was aimed slightly more at business users than developers, so it doesn't dig real deep on the tech side, but there is a list of popular OSS NLP tools that people interested in this topic might find useful.

And if I can throw in another shameless plug (only because I think it will genuinely be of interest, of course), I'll point out this post[3] on Prolog resources, since Prolog often finds application in the NLP world.

[1]: http://stanbol.apache.org

[2]: http://osintegrators.com/opensoftwareintegrators|howyoucanbe...

[3]: http://fogbeam.blogspot.com/2013/05/prolog-im-going-to-learn...

And if I can throw in another shameless plug (only because I think it will genuinely be of interest, of course), I'll point out this post[3] on Prolog resources, since Prolog often find application in the NLP world

You missed the nicest and most satisfying book ;):

http://www.mtome.com/Publications/PNLA/pnla-digital.html

It is simultaneously an introduction to Prolog and natural language parsing using Prolog.

Very cool. That post was originally written quite some time ago, and it was never meant to be an exhaustive list. That said, I'll add this to the list as well. Thanks for the pointer!